Abused Strings Considered Harmful

Overuse of Strings to Represent Non-String Data Leads to Problems

© James Huw Evans

Programming Languages Article, (C) Huw Evans

Engineers can be tempted into using strings to represent data that would be better stored in other forms. This article shows how to recognise the situation and what to do

It's safe to say that strings get used a lot in programs. It's also safe to say that strings get abused in lots of programs.

"Hello, World!"

is a string. However,

"Location=Jersey;Time=09:00; Position=49.02N:2.2W"

is an object, encoded as a string.

Spotting the Anti-Pattern

It's easy to spot the anti-pattern (Anti-Patterns Home Website, Wikipedia Article). If you see a string being used to store information that is first translated to another type before being used, it's probably an abused string (*). It's easy to get into this situation: you have to add something to your code to capture a new piece of information. As it's related to a string you already have in your code you reason you can just add it to the end and use the substring and splitting operations to access the contents. You finish this quickly and feel that it's a job well done.

The Consequences of the Anti-Pattern

However, you probably don't want to do this. It is certainly quicker in development time but you also have to consider the long-term health of your code. Using a string like this isn't type-safe. The time and position information is fundamentally non-string data, which your program will want to use in a non-string way. As the string isn't encapsulated, code that can access it can change it in any way which could cause some parts of your code to fail. This means wherever you use it, you may be forced to test the contents of the string beforehand to ensure it's correctly formatted. Lastly, the string isn't abstract, it's just a string. Because of this, the concepts it represents cannot be conveniently reused and access to them cannot be constrained or extended.

Refactoring the Anti-Pattern

Initially it seems beneficial to extend the string, but ultimately you will increase the complexity and inter-coupling of your code and you will decrease its abstraction which will make future testing and maintenance more difficult. The refactored solution ensures type-safety, encapsulation and abstraction. To refactor our example above in an object-oriented programming language, you could provide an Event class that had three fields to represent the concepts of places, times and locations. This allows you to more closely model the real-world and think at the level of Events, rather than be distracted with the details of the string and its encoding and manipulation. The three fields could also be individually typed, leading to better quality code.

Conclusions

Strings are an essential component of any modern programming language. However, it's important to not abuse their convenience by using them to represent information that is really not string-based.

(*) Some system components (such as database drivers) require the programmer to pass in complex strings that represent information such as database server name and username and password. Even in this case, it's advantageous to abstract over the data so that it can be easily initialised and protected, providing a single piece of code to generate the desired string.


The copyright of the article Abused Strings Considered Harmful in Computer Programming Languages is owned by James Huw Evans. Permission to republish Abused Strings Considered Harmful must be granted by the author in writing.


Programming Languages Article, (C) Huw Evans
       


Post this Article to facebook Add this Article to del.icio.us! Digg this Article furl this Article Add this Article to Reddit Add this Article to Technorati Add this Article to Newsvine Add this Article to Windows Live Add this Article to Yahoo Add this Article to StumbleUpon Add this Article to BlinkLists Add this Article to Spurl Add this Article to Google Add this Article to Ask Add this Article to Squidoo