Types for Modeling Data

August 24, 2018

When developers have a new piece of data to handle, they will create a new type for that data. That's a pretty standard approach. But they often don't create new types to handle a list or collection of that data, or even a type for the identifier. Having these extra types can yield many benefits.

1 Thing - 3 Types

Whenever I need to create a type for a new piece a data, I always create 3 associated types. For the purpose of illustration, let's call them: Thing, ThingID and ThingList.

Thing - the type that encapsulates a single instance of the piece of data
ThingID - the type for the unique identifier of Thing
ThingList - the type for a collection of Thing

These 3 types work nicely together in code to handle many, if not most, situations.

See my earlier blog entry Identifiers are Types, for the benefits of ThingID.

List Benefits

Having a ThingList type allows for encapsulation of common list-related operations, improving code structure, readability and reuse. There are often cases where code is needed to iterate through a list, performing some operation such as filtering or updating. Many developers build this code in the place where it is used. If multiple places need the same logic, they often duplicate code.

A better approach is to create a method on the ThingList type.

Consider the need to filter a list for a particular count (i.e. some arbitrary attribute). In Java, this could be:

class ThingList extends ArrayList<Thing>
{
    ThingList findByLessThanCount(int count)
    {
        ThingList thingList = new ThingList();
        for (Thing thing : this)
        {
            if (thing.count < count)
                thingList.add(thing);
        }
        return thingList;
    }
}

Often times, a map of ThingID to Thing is needed:

Map<ThingID, Thing> convertToMap()
{
    Map<ThingID, Thing> thingMap = new HashMap<>();
    for (Thing thing : this)
        thingMap.put(thing.thingID, thing);
    return thingMap;
}

Other times, we may want a set of ThingID:

Set<ThingID> getIDSet()
{
    Set<ThingID> thingIDSet = new HashSet<>();
    for (Thing thing : this)
        thingIDSet.add(thing.thingID);
    return thingIDSet;
}

The ThingList type is a very convenient and organized place for these methods to exist.

Reading and Writing Data

The Thing and ThingList types approach make reading and writing data much easier and very consistent across the code base.

For all of my projects, I will have an abstract base type for reading and writing data, usually named DataReader and DataWriter, with derived concrete types for handling particular data types, such as JsonDataReader, JsonDataWriter, XmlDataReader, XmlDataWriter, DatabaseDataReader and DatabaseDataWriter.

These types will expose read/write method for native types, such as readInt, readDate and readBoolean, as well as methods for complex types, such as readObject and readList.

The Thing type makes use of DataReader and DataWriter to move data into and out of a specific instance, using one of two approaches.

Using Read/Write Methods

The first approach is to create 2 methods on Thing, one for reading and one for writing. In each method, every member of the type is read or written:

@Override
public void readFrom(DataReader reader) throws Exception
{
    thingID = reader.readDataID("ThingID", ThingID.CtorDataID);
    name = reader.readString("Name", NameMaxLength);
}

@Override
public void writeTo(DataWriter writer) throws Exception
{
    writer.writeDataID("ThingID", thingID);
    writer.writeString("Name", name, NameMaxLength);
}

This approach has the object directly controlling the field names of the data and the order the data is read or written. It also allows the type to be a little more black-boxed. For example, there may be fields of the type that should never be changed after the initial version is created (such as a reference ID to parent data). In this case, no set method needs to be provided for the member. This approach also allows for a single point of code execution after all of the members have been set.

Direct Field Access

A second approach is for the type members to be directly accessed for fetching or setting. In this situation, the source data is queried and then the public type members are queried and accessed. It allows for a little less code but also usually requires that all members have public set methods, whose names directly match the source data. This approach increases the possibility for defects.

Permanent Storage

In general, most data is stored as hierarchical data or as database data. The reading and writing approach above can work well in both situations (but with some differences).

Hierarchical Data

Hierarchical data is the model of data where some thing contains a combination of native data types, other things or lists of other things. JSON and XML are data formats that model hierarchical data. The Thing and ThingList types approach fit perfectly into this model.

Let's say our Thing is a person. And for our project, for each person, we need to track their first and last names, their birth date, a list of addresses, and their children (who are also modeled as people).

We create our three types as: Person, PersonID, and PersonList. In Java, this would be:

class Person
{
    PersonID personID;
    String firstName;
    String lastName;
    Date birthDate;
    AddressList addresses;
    PersonList children;
}

class PersonList extends ArrayList<Person>
{
}

For our illustration, Address, AddressID and AddressList are the types to handle the address data.

In JSON, a particular person might look like:

{
    "personID": 1234,
    "firstName": "Michael",
    "lastName": "Smith",
    "birthDate": "1990-01-01",
    "address": [
    {
        "Street1": "123 Main St.",
        "City": "Anytown",
        "State": "AA",
        "PostalCode": "99999"
    },
    {
        "Street1": "456 Elm St.",
        "City": "Anytown",
        "State": "AA",
        "PostalCode": "99999"
    }],
    "child": [
    {
        "personID": 2345,
        "firstName": "Mary",
        "lastName": "Smith",
        "birthDate": "2010-01-01"
    },
    {
        "personID": 3456,
        "firstName": "Martin",
        "lastName": "Smith",
        "birthDate": "2012-01-01"
    }]
}

Once the readFrom and writeTo methods have been created for these types, reading from a JSON string is a simple as:

person = new JsonDataReader(jsonString).readObject(null, Person.CtorDataReader);

Database Data

Unlike hierarchical data, database data is essentially flat data, where a single piece of data is stored in a single table row. It's not easy to create types that work with both hierarchical and database data, but the Thing and ThingList approach still has many benefits when use for database data.

Database library access APIs have historically been organized around a record set concept with their approach to SQL's SELECT, INSERT, UPDATE and DELETE calls. Those APIs are not object-oriented and programmers often get tripped up about the best way to move record set data to and from instances of object-oriented types, as well as, how those types should best handle their specific SQL calls.

The Thing and ThingList approach, along with the DataReader, DataWriter approach above, can be use for database access and with a few simple rules, can go a long way to keeping the code efficient, readable, and maintainable.

The Thing object should be used to encapsulate SELECT, INSERT,UPDATE and DELETE operations that affect a single record, such as when searching by a unique key, knowing that only a single record will be returned.
The ThingList object should be used for any SELECT call that is not searching by a unique key, where 0, 1, or many records will be returned. Not as important, but for good for code organization, ThingList should also be used to encapsulate bulk SQL calls, such as bulk UPDATE or DELETE calls. These methods would be static methods on ThingList and would not actually pull any database record data into memory.

All of the database library access calls would take place in a single type (or a small set of types). Lately, I have been calling this the DatabaseAdaptor type. As such, the Thing and ThingList types do not have code specific to the database library APIs.

The DatabaseAdaptor type exposes methods for the single-record based INSERT, UPDATE and DELETE calls, as well as methods for performing SELECT calls, understanding multiple records will be returned. DatabaseAdaptor does not handle an explicit Thing type, but instead understands that data is read and written to memory via DataReader and DataWriter. This approach allows for very simple and concise Thing and ThingList code.

In Java, our Person type would look something like:

class Person extends DatabaseObject
{
    public static DatabaseAdaptor<Person, PersonList> getDatabaseAdaptor()
        { return DatabaseAdaptor.getDatabaseAdaptorForClass(
            Person.class.getSimpleName()); }

    private PersonID personID;
    private String firstName;
    private String lastName;
    private Date birthDate;

    // Constructor when creating a new instance, not yet stored
    public Person()
    {
        personID = PersonID.newInstance();
    }

    // Constructor when reading an instance from stored data, via DatabaseAdaptor
    public Person(DataReader reader) throws Exception
    {
        super(reader);
        readFrom(reader);
    }

    // Fetches an existing stored Person when PersonID is known
    public static Person get(PersonID personID) throws Exception
    {
        return getDatabaseAdaptor().selectByKey(personID, DataExists.MustExist);
    }

    @Override
    public void readFrom(DataReader reader) throws Exception
    {
        personID = reader.readDataID("PersonID", PersonID.CtorDataID);
        firstName = reader.readString("FirstName", NameMaxLength);
        lastName = reader.readString("LastName", NameMaxLength);
        birthDate = reader.readDate("BirthDate");
    }

    @Override
    public void writeTo(DataWriter writer) throws Exception
    {
        writer.writeDataID("PersonID", personID);
        writer.writeString("FirstName", firstName, NameMaxLength);
        writer.writeString("LastName", lastName, NameMaxLength);
        writer.writeDate("BirthDate", birthDate);
    }

    public void update() throws Exception
    {
        getDatabaseAdaptor().update(this);
    }

    public void delete() throws Exception
    {
        getDatabaseAdaptor().delete(personID);
    }
}

Our PersonList type would look something like:

public class PersonList extends DatabaseObjectList<Person>
{
    public static PersonList findByLastName(String lastName) throws Exception
    {
        if (!StrUtil.hasLen(lastName))
            throw new IllegalArgumentException("lastName is empty");

        DatabaseProcParam params[] = new DatabaseProcParam[1];
        params[0] = new DatabaseProcParam(Types.VARCHAR, lastName);

        return Person.getDatabaseAdaptor().selectManyByProc(
            "Person_GetByLastName", params);
    }
}

To see production projects where these approaches have been fully implemented, see my iNetVOD projects on GitHub. They are Java-based projects, circa 2007 or so.