Identifiers are Types
Identifiers are unique types and should be treated as such in our code. In software, we create object types to model the world around us. We create unique object types to model specific data. Identifiers too are models of a specific kind of data, with its own unique scope. The identifiers for one type of data are completely different than the identifiers in another type of data. A string of numbers representing a social security number is different from a string of numbers representing a phone number, and is different from a string of numbers representing a driver's license number. As such, these all should be modeled differently in our code.
Some developers have a tendency to declared identifiers using native language types, such as
Integer
, int
and String
. This approach has huge downsides. Not
only is the code less readable, it opens the code to a great deal of accidental mistakes.
Identifier Types
A much better approach than native language types is to declare unique types for each type of identifier. This approach has many benefits. Variable declaration makes the code's intention explicit. Method parameters are explicit and enforced. Instances are not mistakenly crossed assigned (i.e. assigning an identifier of one type to an identifier of a completely different type). The resulting code is significantly more readable and maintainable.
Clear Variable Declarations
Unique identifier types make the code much clearer than using native types.
Let's say we are building a contact management system and we need identifiers for the people and their
various addresses. Let's say we called these identifiers personID
and
addressID
, referring to each unique instance of this data.
Often, developers will declared these variables as:
// Using native types
int personID;
int addressID;
or maybe:
// Using native types
String personID;
String addressID;
These are good variable names, but what if the developer was not so clear with their variable naming:
// Confusing, variables not named as identifiers
String person;
String address;
What if these unclear names were used a method parameters:
// Confusing method parameters
void associate(String person, String address)
This sort of approach will lead to more problems.
Using identifier types is much clearer:
// Clear identifier types and variables
PersonID personID;
AddressID addressID;
Even if the variable names are misleading:
// Clear identifier types
PersonID person;
AddressID address;
// Clear method parameter types
void associate(PersonID person, AddressID address)
Better Method Declarations
Unique identifier types solve a big problem with method declarations involving identifiers.
For example, maybe our contact management system allows for people and addresses to be linked. Let's say
personID
and addressID
variables are both declared as int
and
there is anassociate
methods that takes both parameters:
// Example method using native types
int personID;
int addressID;
void associate(int personID, int addressID)
It would be very easy to accidentally flip the parameters when calling the method:
// Parameters mistakenly reversed
associate(addressID, personID);
This would not give any compile error, allowing the mistake to be built into the executable. Testing for
this condition can also be a challenge. Since personID
and addressID
are both
integers, it is highly likely the approach for assigning new values starts at 1 and increases. The
result is that bothpersonID
and addressID
would share very similar value
domains. For example, both could have a value of 10, even if they are completely independent of each
other. Calling the associate
method with mixed up identifiers could manifest in very odd
ways, making the code defect very hard to locate.
Identifier types have another benefit to methods. It is very common practice to overload methods with
similar but different parameters. For example, there may be a third identifier, such as a
FamilyID
, and perhaps we want a variation to the associate
method:
// Can't overload associate method when using native types
void associate(int familyID, int addressID)
This new method would fail to compile since it is syntactically the same as
void associate(int personID, int addressID)
.
But this approach works when using identifier types:
// Overloaded associate method allowed
void associate(PersonID personID, AddressID addressID)
void associate(FamilyID familyID, AddressID addressID)
Additional Validation
Another benefit of identifier types is additional validation. For types based on Strings
,
I consider it illegal to allow for an empty String
as the internal value. I also validate
my String
values against a maximum length. These identifiers are often stored in database
tables and we normally limit our table columns to a specific length.
For types based on Integers
, we validate when converting a string representation to an
integer. It's obvious there are many string values that do not covert to integers. We may also know a
specific identifier type has a limited value domain, so we can validate for that domain.
On a recent project I had to store the country with an address. There is an ISO standard for country
codes, so it made sense that the country values would be a unique type, such as CountryID
.
When I'm reading the CountryID
values from JSON or XML, there is nothing stopping the
creator from sending bad values. My CountryID
validates against valid ISO values.
Constant Comparisons
We often have constants for various identifier types. By having a true identifier type, we can define the constant part of the type, giving it a well-defined scope, making the code more logical and easier to read.
For example, our CountryID
needed a US
constant so code could understand USA
addresses:
// Constants can be defined in type's scope
public class CountryID
{
public static final CountryID US = new CountryID("US");
}
USA addresses require a State to be specified, so I ended up with UI logic to understand when to display the State field:
if (CountryID.US.equals(countryID))
Immutable Identifier Types
Instances of identifier types should be immutable. The value of the identifier should be set when the instance is created. Access to the internal value of the identifier should be limited. Most of the code written against the instances should avoid any logic that depends on understanding the internal value (most code should not need to care about the internal value). Code generally only needs the type to support reading/writing to string values. This allows support for external formats such as JSON, XML, HTML, etc. and allows writing values to log files. Database access may require calls based on native types, but these can usually be encapsulated in a small amount of specific code.
Java Example
Let me illustrate how I handle this approach in Java code.
I usually have a DataID
abstract class:
public abstract class DataID
{
}
Since the Java base Object class provides the public String toString()
method, there is no
need for another method to fetch the identifier as a string value. Each identifier class will need to
provide a constructor taking a String
parameter. Together these allow common code to access
and use identifier instances without knowing or caring about its internal value.
Then I'll have StringID
and IntegerID
abstract classes implementing the
DataID
protocol, whose purpose is to maintain an internal value of String
and
int
respectively.
public abstract class StringID extends DataID
public abstract class IntegerID extends DataID
To allow StringID
and IntegerID
to behave more like native types and so they work correctly for maps, sets, comparisons, etc., I'll include overrides for:
public boolean equals(Object o)
public int hashCode()
and implement the Comparable
interface:
... implements Comparable<StringID>
public int compareTo(StringID o)
... implements Comparable<IntegerID>
public int compareTo(IntegerID o)
The resulting classes will look something like:
public abstract class StringID extends DataID implements Comparable<StringID>
{
private final String fValue;
public StringID(String value)
{
if ((value == null) || (value.length() == 0))
throw new IllegalArgumentException("value is undefined");
fValue = value;
}
@Override
public String toString()
{
return fValue;
}
@Override
public boolean equals(Object o)
{
if (!(o instanceof StringID))
return false;
return fValue.equals(((StringID)o).fValue);
}
@Override
public int hashCode()
{
return fValue.hashCode();
}
@Override
public int compareTo(StringID o)
{
if (o == null)
return 1;
return fValue.compareTo(o.fValue);
}
}
public abstract class IntegerID extends DataID implements Comparable<IntegerID>
{
private final int fValue;
private final String fValueStr;
public IntegerID(Integer value)
{
if (value == null)
throw new IllegalArgumentException("value is undefined");
fValue = value;
fValueStr = Integer.toString(fValue);
}
public IntegerID(String value)
{
if ((value == null) || (value.trim().length() == 0))
throw new IllegalArgumentException("value is undefined");
fValue = Integer.decode(value.trim());
fValueStr = Integer.toString(fValue);
}
public int toInteger()
{
return fValue;
}
@Override
public String toString()
{
return fValueStr;
}
@Override
public boolean equals(Object o)
{
if (!(o instanceof IntegerID))
return false;
return fValue == ((IntegerID)o).fValue;
}
@Override
public int hashCode()
{
return fValueStr.hashCode();
}
@Override
public int compareTo(IntegerID o)
{
if (o == null)
return 1;
return Integer.compare(fValue, o.fValue);
}
}
With IntegerID
, I've added public IntegerID(Integer value)
and
public int toInteger()
. These aren't strictly needed, but are nice helpers when serializing
IntegerID via code that explicitly supports native types, such as JDBC.
In some projects, I'll have additional base identifier types, such as UUStringID
, which
understands UUID
values represented as strings, and UUBinaryID
, which
understands UUID
values as byte[]
.
As I noted above, an additional validation for StringID
might be to check the length of the
string value. There are two changes to StringID
for this. First, an abstract
getMaxLength()
method is added:
public abstract int getMaxLength();
Second, the StringID(String value)
constructor adds the validation:
int maxLength = getMaxLength();
if (value.length() > getMaxLength())
throw new IllegalArgumentException(String.format(
"value(%s) length(%d) greater than max(%d)", value,
value.length(), maxLength));
Now that our base types are defined, the application specific classes are pretty simple:
public class ApplicationID extends StringID
{
public ApplicationID (String value)
{
super(value);
}
@Override
public int getMaxLength() { return 32; }
}
Initialization
The final piece is handling common code that needs to construct new instances of our DataID
without having to know the concrete derived class. In Java, I handle this using the
Constructor
class. My identifier types expose a Constructor
constant:
public static final Constructor<ApplicationID> CtorDataID
= CtorUtil.getCtorString(ApplicationID.class);
Then I may have a class for reading JSON data, which provides a methods for reading a
DataID
:
public <T extends DataID> T readDataID(String fieldName,
Constructor<T> ctorDataID) throws Exception
Likewise, I may have a class for writing JSON data, where a writeDataID()
method can make
use of the toString()
method provided.
This approach simplifies reading and writing identifier values, adds validation, both while hiding the internal implementation of the identifier's value.
Conclusion
Using identifier types over native types has many benefits. The resulting code will contain far fewer errors, be more readable and more maintainable for years to come.