Identifiers are Types

April 13, 2018

Identifiers are unique types and should be treated as such in our code. In software, we create object types to model the world around us. We create unique object types to model specific data. Identifiers too are models of a specific kind of data, with its own unique scope. The identifiers for one type of data are completely different than the identifiers in another type of data. A string of numbers representing a social security number is different from a string of numbers representing a phone number, and is different from a string of numbers representing a driver's license number. As such, these all should be modeled differently in our code.

Some developers have a tendency to declared identifiers using native language types, such as Integer, int and String. This approach has huge downsides. Not only is the code less readable, it opens the code to a great deal of accidental mistakes.

Identifier Types

A much better approach than native language types is to declare unique types for each type of identifier. This approach has many benefits. Variable declaration makes the code's intention explicit. Method parameters are explicit and enforced. Instances are not mistakenly crossed assigned (i.e. assigning an identifier of one type to an identifier of a completely different type). The resulting code is significantly more readable and maintainable.

Clear Variable Declarations

Unique identifier types make the code much clearer than using native types.

Let's say we are building a contact management system and we need identifiers for the people and their various addresses. Let's say we called these identifiers personID and addressID, referring to each unique instance of this data.

Often, developers will declared these variables as:

// Using native types
int personID;
int addressID;

or maybe:

// Using native types
String personID;
String addressID;

These are good variable names, but what if the developer was not so clear with their variable naming:

// Confusing, variables not named as identifiers
String person;
String address;

What if these unclear names were used a method parameters:

// Confusing method parameters
void associate(String person, String address)

This sort of approach will lead to more problems.

Using identifier types is much clearer:

// Clear identifier types and variables
PersonID personID;
AddressID addressID;

Even if the variable names are misleading:

// Clear identifier types
PersonID person;
AddressID address;

// Clear method parameter types
void associate(PersonID person, AddressID address)

Better Method Declarations

Unique identifier types solve a big problem with method declarations involving identifiers.

For example, maybe our contact management system allows for people and addresses to be linked. Let's say personID and addressID variables are both declared as int and there is anassociate methods that takes both parameters:

// Example method using native types
int personID;
int addressID;
void associate(int personID, int addressID)

It would be very easy to accidentally flip the parameters when calling the method:

// Parameters mistakenly reversed
associate(addressID, personID);

This would not give any compile error, allowing the mistake to be built into the executable. Testing for this condition can also be a challenge. Since personID and addressID are both integers, it is highly likely the approach for assigning new values starts at 1 and increases. The result is that bothpersonID and addressID would share very similar value domains. For example, both could have a value of 10, even if they are completely independent of each other. Calling the associatemethod with mixed up identifiers could manifest in very odd ways, making the code defect very hard to locate.

Identifier types have another benefit to methods. It is very common practice to overload methods with similar but different parameters. For example, there may be a third identifier, such as a FamilyID, and perhaps we want a variation to the associate method:

// Can't overload associate method when using native types
void associate(int familyID, int addressID)

This new method would fail to compile since it is syntactically the same as void associate(int personID, int addressID).

But this approach works when using identifier types:

// Overloaded associate method allowed
void associate(PersonID personID, AddressID addressID)
void associate(FamilyID familyID, AddressID addressID)

Additional Validation

Another benefit of identifier types is additional validation. For types based on Strings, I consider it illegal to allow for an empty String as the internal value. I also validate my String values against a maximum length. These identifiers are often stored in database tables and we normally limit our table columns to a specific length.

For types based on Integers, we validate when converting a string representation to an integer. It's obvious there are many string values that do not covert to integers. We may also know a specific identifier type has a limited value domain, so we can validate for that domain.

On a recent project I had to store the country with an address. There is an ISO standard for country codes, so it made sense that the country values would be a unique type, such as CountryID. When I'm reading the CountryID values from JSON or XML, there is nothing stopping the creator from sending bad values. My CountryID validates against valid ISO values.

Constant Comparisons

We often have constants for various identifier types. By having a true identifier type, we can define the constant part of the type, giving it a well-defined scope, making the code more logical and easier to read.

For example, our CountryID needed a US constant so code could understand USA addresses:

// Constants can be defined in type's scope
public class CountryID
{
    public static final CountryID US = new CountryID("US");
}

USA addresses require a State to be specified, so I ended up with UI logic to understand when to display the State field:

if (CountryID.US.equals(countryID))

Immutable Identifier Types

Instances of identifier types should be immutable. The value of the identifier should be set when the instance is created. Access to the internal value of the identifier should be limited. Most of the code written against the instances should avoid any logic that depends on understanding the internal value (most code should not need to care about the internal value). Code generally only needs the type to support reading/writing to string values. This allows support for external formats such as JSON, XML, HTML, etc. and allows writing values to log files. Database access may require calls based on native types, but these can usually be encapsulated in a small amount of specific code.

Java Example

Let me illustrate how I handle this approach in Java code.

I usually have a DataID abstract class:

public abstract class DataID
{
}

Since the Java base Object class provides the public String toString() method, there is no need for another method to fetch the identifier as a string value. Each identifier class will need to provide a constructor taking a String parameter. Together these allow common code to access and use identifier instances without knowing or caring about its internal value.

Then I'll have StringID and IntegerID abstract classes implementing the DataID protocol, whose purpose is to maintain an internal value of String and int respectively.

public abstract class StringID extends DataID
public abstract class IntegerID extends DataID

To allow StringID and IntegerID to behave more like native types and so they work correctly for maps, sets, comparisons, etc., I'll include overrides for:

public boolean equals(Object o)
public int hashCode()

and implement the Comparable interface:

... implements Comparable<StringID>
public int compareTo(StringID o)

... implements Comparable<IntegerID>
public int compareTo(IntegerID o)

The resulting classes will look something like:

public abstract class StringID extends DataID implements Comparable<StringID>
{
    private final String fValue;

    public StringID(String value)
    {
        if ((value == null) || (value.length() == 0))
            throw new IllegalArgumentException("value is undefined");

        fValue = value;
    }

    @Override
    public String toString()
    {
        return fValue;
    }

    @Override
    public boolean equals(Object o)
    {
        if (!(o instanceof StringID))
            return false;

        return fValue.equals(((StringID)o).fValue);
    }

    @Override
    public int hashCode()
    {
        return fValue.hashCode();
    }

    @Override
    public int compareTo(StringID o)
    {
        if (o == null)
            return 1;

        return fValue.compareTo(o.fValue);
    }
}

public abstract class IntegerID extends DataID implements Comparable<IntegerID>
{
    private final int fValue;
    private final String fValueStr;

    public IntegerID(Integer value)
    {
        if (value == null)
            throw new IllegalArgumentException("value is undefined");

        fValue = value;
        fValueStr = Integer.toString(fValue);
    }

    public IntegerID(String value)
    {
        if ((value == null) || (value.trim().length() == 0))
            throw new IllegalArgumentException("value is undefined");

        fValue = Integer.decode(value.trim());
        fValueStr = Integer.toString(fValue);
    }

    public int toInteger()
    {
        return fValue;
    }

    @Override
    public String toString()
    {
        return fValueStr;
    }

    @Override
    public boolean equals(Object o)
    {
        if (!(o instanceof IntegerID))
            return false;

        return fValue == ((IntegerID)o).fValue;
    }

    @Override
    public int hashCode()
    {
        return fValueStr.hashCode();
    }

    @Override
    public int compareTo(IntegerID o)
    {
        if (o == null)
            return 1;

        return Integer.compare(fValue, o.fValue);
    }
}

With IntegerID, I've added public IntegerID(Integer value) and public int toInteger(). These aren't strictly needed, but are nice helpers when serializing IntegerID via code that explicitly supports native types, such as JDBC.

In some projects, I'll have additional base identifier types, such as UUStringID, which understands UUID values represented as strings, and UUBinaryID, which understands UUID values as byte[].

As I noted above, an additional validation for StringID might be to check the length of the string value. There are two changes to StringID for this. First, an abstract getMaxLength() method is added:

public abstract int getMaxLength();

Second, the StringID(String value) constructor adds the validation:

int maxLength = getMaxLength();
if (value.length() > getMaxLength())
    throw new IllegalArgumentException(String.format(
        "value(%s) length(%d) greater than max(%d)", value,
        value.length(), maxLength));

Now that our base types are defined, the application specific classes are pretty simple:

public class ApplicationID extends StringID
{
    public ApplicationID (String value)
    {
        super(value);
    }

    @Override
    public int getMaxLength() { return 32; }
}

Initialization

The final piece is handling common code that needs to construct new instances of our DataID without having to know the concrete derived class. In Java, I handle this using the Constructor class. My identifier types expose a Constructor constant:

public static final Constructor<ApplicationID> CtorDataID
    = CtorUtil.getCtorString(ApplicationID.class);

Then I may have a class for reading JSON data, which provides a methods for reading a DataID:

public <T extends DataID> T readDataID(String fieldName,
    Constructor<T> ctorDataID) throws Exception

Likewise, I may have a class for writing JSON data, where a writeDataID() method can make use of the toString() method provided.

This approach simplifies reading and writing identifier values, adds validation, both while hiding the internal implementation of the identifier's value.

Conclusion

Using identifier types over native types has many benefits. The resulting code will contain far fewer errors, be more readable and more maintainable for years to come.