Engineers can be tempted into using strings to represent data that would be better stored in other forms. This article shows how to recognise the situation and what to do
It's safe to say that strings get used a lot in programs. It's also safe to say that strings get abused in lots of programs.
is a string. However,
is an object, encoded as a string.
It's easy to spot the anti-pattern (Anti-Patterns Home Website, Wikipedia Article). If you see a string being used to store information that is first translated to another type before being used, it's probably an abused string (*). It's easy to get into this situation: you have to add something to your code to capture a new piece of information. As it's related to a string you already have in your code you reason you can just add it to the end and use the substring and splitting operations to access the contents. You finish this quickly and feel that it's a job well done.
However, you probably don't want to do this. It is certainly quicker in development time but you also have to consider the long-term health of your code. Using a string like this isn't type-safe. The time and position information is fundamentally non-string data, which your program will want to use in a non-string way. As the string isn't encapsulated, code that can access it can change it in any way which could cause some parts of your code to fail. This means wherever you use it, you may be forced to test the contents of the string beforehand to ensure it's correctly formatted. Lastly, the string isn't abstract, it's just a string. Because of this, the concepts it represents cannot be conveniently reused and access to them cannot be constrained or extended.
Initially it seems beneficial to extend the string, but ultimately you will increase the complexity and inter-coupling of your code and you will decrease its abstraction which will make future testing and maintenance more difficult. The refactored solution ensures type-safety, encapsulation and abstraction. To refactor our example above in an object-oriented programming language, you could provide an Event class that had three fields to represent the concepts of places, times and locations. This allows you to more closely model the real-world and think at the level of Events, rather than be distracted with the details of the string and its encoding and manipulation. The three fields could also be individually typed, leading to better quality code.
Strings are an essential component of any modern programming language. However, it's important to not abuse their convenience by using them to represent information that is really not string-based.
(*) Some system components (such as database drivers) require the programmer to pass in complex strings that represent information such as database server name and username and password. Even in this case, it's advantageous to abstract over the data so that it can be easily initialised and protected, providing a single piece of code to generate the desired string.