Types for Modeling Data
When developers have a new piece of data to handle, they will create a new type for that data. That's a pretty standard approach. But they often don't create new types to handle a list or collection of that data, or even a type for the identifier. Having these extra types can yield many benefits.
1 Thing - 3 Types
Whenever I need to create a type for a new piece a data, I always create 3 associated types. For the
purpose of illustration, let's call them: Thing
, ThingID
and
ThingList
.
Thing
- the type that encapsulates a single instance of the piece of data
ThingID
- the type for the unique identifier of Thing
ThingList
- the type for a collection of Thing
These 3 types work nicely together in code to handle many, if not most, situations.
See my earlier blog entry Identifiers are Types, for the benefits of
ThingID
.
List Benefits
Having a ThingList
type allows for encapsulation of common list-related operations,
improving code structure, readability and reuse. There are often cases where code is needed to iterate
through a list, performing some operation such as filtering or updating. Many developers build this
code in the place where it is used. If multiple places need the same logic, they often duplicate
code.
A better approach is to create a method on the ThingList
type.
Consider the need to filter a list for a particular count (i.e. some arbitrary attribute). In Java, this could be:
class ThingList extends ArrayList<Thing>
{
ThingList findByLessThanCount(int count)
{
ThingList thingList = new ThingList();
for (Thing thing : this)
{
if (thing.count < count)
thingList.add(thing);
}
return thingList;
}
}
Often times, a map of ThingID
to Thing
is needed:
Map<ThingID, Thing> convertToMap()
{
Map<ThingID, Thing> thingMap = new HashMap<>();
for (Thing thing : this)
thingMap.put(thing.thingID, thing);
return thingMap;
}
Other times, we may want a set of ThingID
:
Set<ThingID> getIDSet()
{
Set<ThingID> thingIDSet = new HashSet<>();
for (Thing thing : this)
thingIDSet.add(thing.thingID);
return thingIDSet;
}
The ThingList
type is a very convenient and organized place for these methods to exist.
Reading and Writing Data
The Thing
and ThingList
types approach make reading and writing data much
easier and very consistent across the code base.
For all of my projects, I will have an abstract base type for reading and writing data, usually named
DataReader
and DataWriter
, with derived concrete types for handling particular
data types, such as JsonDataReader
, JsonDataWriter
,
XmlDataReader
, XmlDataWriter
, DatabaseDataReader
and
DatabaseDataWriter
.
These types will expose read/write method for native types, such as readInt
,
readDate
and readBoolean
, as well as methods for complex types, such as
readObject
and readList
.
The Thing
type makes use of DataReader
and DataWriter
to move data
into and out of a specific instance, using one of two approaches.
Using Read/Write Methods
The first approach is to create 2 methods on Thing
, one for reading and one for writing.
In each method, every member of the type is read or written:
@Override
public void readFrom(DataReader reader) throws Exception
{
thingID = reader.readDataID("ThingID", ThingID.CtorDataID);
name = reader.readString("Name", NameMaxLength);
}
@Override
public void writeTo(DataWriter writer) throws Exception
{
writer.writeDataID("ThingID", thingID);
writer.writeString("Name", name, NameMaxLength);
}
This approach has the object directly controlling the field names of the data and the order the data is read or written. It also allows the type to be a little more black-boxed. For example, there may be fields of the type that should never be changed after the initial version is created (such as a reference ID to parent data). In this case, no set method needs to be provided for the member. This approach also allows for a single point of code execution after all of the members have been set.
Direct Field Access
A second approach is for the type members to be directly accessed for fetching or setting. In this situation, the source data is queried and then the public type members are queried and accessed. It allows for a little less code but also usually requires that all members have public set methods, whose names directly match the source data. This approach increases the possibility for defects.
Permanent Storage
In general, most data is stored as hierarchical data or as database data. The reading and writing approach above can work well in both situations (but with some differences).
Hierarchical Data
Hierarchical data is the model of data where some thing contains a combination of native data types,
other things or lists of other things. JSON
and XML
are data formats that
model hierarchical data. The Thing
and ThingList
types approach fit perfectly
into this model.
Let's say our Thing
is a person. And for our project, for each person, we need to track
their first and last names, their birth date, a list of addresses, and their children (who are also
modeled as people).
We create our three types as: Person
, PersonID
, and PersonList
. In
Java, this would be:
class Person
{
PersonID personID;
String firstName;
String lastName;
Date birthDate;
AddressList addresses;
PersonList children;
}
class PersonList extends ArrayList<Person>
{
}
For our illustration, Address
, AddressID
and AddressList
are the
types to handle the address data.
In JSON, a particular person might look like:
{
"personID": 1234,
"firstName": "Michael",
"lastName": "Smith",
"birthDate": "1990-01-01",
"address": [
{
"Street1": "123 Main St.",
"City": "Anytown",
"State": "AA",
"PostalCode": "99999"
},
{
"Street1": "456 Elm St.",
"City": "Anytown",
"State": "AA",
"PostalCode": "99999"
}],
"child": [
{
"personID": 2345,
"firstName": "Mary",
"lastName": "Smith",
"birthDate": "2010-01-01"
},
{
"personID": 3456,
"firstName": "Martin",
"lastName": "Smith",
"birthDate": "2012-01-01"
}]
}
Once the readFrom
and writeTo
methods have been created for these types,
reading from a JSON
string is a simple as:
person = new JsonDataReader(jsonString).readObject(null, Person.CtorDataReader);
Database Data
Unlike hierarchical data, database data is essentially flat data, where a single piece of data is stored
in a single table row. It's not easy to create types that work with both hierarchical and
database data, but the Thing
and ThingList
approach still has many benefits
when use for database data.
Database library access APIs have historically been organized around a record set concept with
their approach to SQL's SELECT
, INSERT
, UPDATE
and
DELETE
calls. Those APIs are not object-oriented and programmers often get tripped up about
the best way to move record set data to and from instances of object-oriented types, as well
as, how those types should best handle their specific SQL calls.
The Thing
and ThingList
approach, along with the DataReader
,
DataWriter
approach above, can be use for database access and with a few simple rules, can
go a long way to keeping the code efficient, readable, and maintainable.
- The
Thing
object should be used to encapsulateSELECT
,INSERT
,UPDATE
andDELETE
operations that affect a single record, such as when searching by a unique key, knowing that only a single record will be returned. - The
ThingList
object should be used for anySELECT
call that is not searching by a unique key, where 0, 1, or many records will be returned. Not as important, but for good for code organization,ThingList
should also be used to encapsulate bulk SQL calls, such as bulkUPDATE
orDELETE
calls. These methods would be static methods onThingList
and would not actually pull any database record data into memory.
All of the database library access calls would take place in a single type (or a small set of types).
Lately, I have been calling this the DatabaseAdaptor
type. As such, the Thing
and ThingList
types do not have code specific to the database library APIs.
The DatabaseAdaptor
type exposes methods for the single-record based INSERT
,
UPDATE
and DELETE
calls, as well as methods for performing SELECT
calls, understanding multiple records will be returned. DatabaseAdaptor
does not handle an
explicit Thing
type, but instead understands that data is read and written to memory via
DataReader
and DataWriter
. This approach allows for very simple and concise
Thing
and ThingList
code.
In Java, our Person
type would look something like:
class Person extends DatabaseObject
{
public static DatabaseAdaptor<Person, PersonList> getDatabaseAdaptor()
{ return DatabaseAdaptor.getDatabaseAdaptorForClass(
Person.class.getSimpleName()); }
private PersonID personID;
private String firstName;
private String lastName;
private Date birthDate;
// Constructor when creating a new instance, not yet stored
public Person()
{
personID = PersonID.newInstance();
}
// Constructor when reading an instance from stored data, via DatabaseAdaptor
public Person(DataReader reader) throws Exception
{
super(reader);
readFrom(reader);
}
// Fetches an existing stored Person when PersonID is known
public static Person get(PersonID personID) throws Exception
{
return getDatabaseAdaptor().selectByKey(personID, DataExists.MustExist);
}
@Override
public void readFrom(DataReader reader) throws Exception
{
personID = reader.readDataID("PersonID", PersonID.CtorDataID);
firstName = reader.readString("FirstName", NameMaxLength);
lastName = reader.readString("LastName", NameMaxLength);
birthDate = reader.readDate("BirthDate");
}
@Override
public void writeTo(DataWriter writer) throws Exception
{
writer.writeDataID("PersonID", personID);
writer.writeString("FirstName", firstName, NameMaxLength);
writer.writeString("LastName", lastName, NameMaxLength);
writer.writeDate("BirthDate", birthDate);
}
public void update() throws Exception
{
getDatabaseAdaptor().update(this);
}
public void delete() throws Exception
{
getDatabaseAdaptor().delete(personID);
}
}
Our PersonList
type would look something like:
public class PersonList extends DatabaseObjectList<Person>
{
public static PersonList findByLastName(String lastName) throws Exception
{
if (!StrUtil.hasLen(lastName))
throw new IllegalArgumentException("lastName is empty");
DatabaseProcParam params[] = new DatabaseProcParam[1];
params[0] = new DatabaseProcParam(Types.VARCHAR, lastName);
return Person.getDatabaseAdaptor().selectManyByProc(
"Person_GetByLastName", params);
}
}
To see production projects where these approaches have been fully implemented, see my iNetVOD projects on GitHub. They are Java-based projects, circa 2007 or so.