The conventional wisdom in software development has always been that internationalization (i18n) is a very expensive and time-consuming effort, and that it’s always better to target just one language first, adding support for more languages later, when the market demands it. Because of that perception, multilingual support is not often considered during the early days of new software projects, and many products end up being constructed in such a way that makes localizing them more difficult than it should be.
In fact, writing software that can easily be localized at a later stage isn’t terribly hard, as long as you start with the end in mind. Use the following techniques to greatly minimize the amount of work required when you eventually take your software product global.
Maintain user-visible strings separately
By far the most important (and most often ignored) guideline in i18n preparation is to ensure that all user-visible strings are separated from the code that uses them.
The typical approach used to achieve this separation is to store the product’s textual content in resource files for that platform or language. For example, Java programmers should maintain their displayable strings inside ResourceBundles.
Text in applications should never be structurally tied to the application’s logic. User-visible strings should always be modifiable without breaking application flow or other functionality.
Interestingly, this task is becoming less and less important for web applications, thanks to innovations in technology that can automatically extract text content from web-delivered software, send those strings to humans for translation, and even deliver the finished localized content online – all without requiring a single change be made to the original source application. This approach can significantly reduce the effort and time required to take a web application global.
Expect user-visible strings to grow or shrink
English phrases may double in size when translated to other languages, and with other target languages they may also shrink by a sizable amount. It is crucial that product code and graphical interface components must not depend on the length of user-visible strings remaining the same.
This obviously presents a challenge when presenting graphical representations of text to the user, but there are plenty of ways to do that correctly.
Keep images free of embedded text
For everything except company or product logos, images that contain text rendered directly within them should be avoided, because it is more difficult to have that text content translated. Instead of having someone quickly translate the contents of a text file, graphic designers would have to be hired to convert the images and put the translated text on them too, one language at a time.
Plan for different calendar, time, phone and currency systems
Developers writing code should always remember that other countries:
- Might format their dates and times differently
- Might not take the same days off at the weekend
- Might use an entirely different calendar system
- Might be more used to numbering their weeks than naming their months
- Might be in a time zone that has a partial-hour offset
- Might use a different measurement system
- Likely use a currency or currencies different to your own
- Use different phone number formats
Build a checklist into your existing review process that watches for these kinds of issues in code to prevent any “hard wiring” that will limit your software’s eventual global usage.
Always use established, globally-focused standards when storing data. For example, use E.164 for phone numbers, ISO-8601 for timestamps, ISO 639.2 for languages, ISO 3166 for countries and the Olson database for time zones.
Avoid ASCII, use Unicode
In the past, engineers building software for English-only markets felt safe to do tricks in their code that only worked when the encoding in use was always ASCII. For example, the flip of a single bit of an ASCII-encoded alphabetical character switches it between lowercase and uppercase. However, the world’s needs have long since moved beyond what ASCII provides and has now embraced Unicode as the correct way to encode text data and support international characters.
Even though Unicode is somewhat compatible with ASCII, the best way to avoid string encoding issues is to:
- never assume that characters are encoded in ASCII
- never assume that that one byte equals one character
- always use Unicode-capable types and libraries
Thankfully, some of the most popular programming languages like Java, C# and Objective-C have native Unicode support in their string types. Using them can make dealing with international characters a trivial effort.
Even if an application supports Unicode throughout, it will likely store data in a database, and a bad database schema can easily ruin all of the good work done in the code to make it compatible with international character sets. For example, in a SQL Server database, user-visible strings should be stored within nvarchar types, and in MySQL it would be best to use utf8mb4.
For more information on Unicode, this introductory article from Joel “On Software” Spolsky is a good introduction.
Don’t assume that text always flows from left to right
The most common difference in text layout that may be encountered is a right-to-left flow, such as that used in Arabic and Hebrew.
In a web application, simply changing the CSS direction property across your entire system would weed out most major layout issues that would be encountered with a future translation into a right-to-left language.
It is also possible to have text flow from top-to-bottom as well, although that’s far less common.
Keep all user-visible language plain and simple
When application messages get localized for other markets, the job of the translators will be much easier if the language used is plain and simple.
Language that is highly technical and reads like technical jargon will not translate well. Prefer common phrases and terms to niche or less frequently used ones.
Test for basic multilingual support
Good developers instinctively write automated tests for product features that they build. The addition of a few more tests to a product verification suite that verify that the product supports international characters would be a hugely valuable task, and not much effort to create.
For example, imagine a web application that allows people to book concert tickets. Let’s assume that a test already exists that verifies that a fake user called “John Doe” can book a seat at a show and verify that his name shows up correctly on the ticket that he receives after paying. Adding a similar test that changes the user’s name to “John-假会河 Doe-沖鈈批” and verifies that those same characters show up on the ticket received after paying would verify that the application supports international characters all the way through the application and the database. This simple technique (sometimes termed “pseudo-internationalization“) allows developers to verify that their applications support the transmission and storage of international strings but without having to do a full professional translation of the product, or without the developers having to be fluent in a foreign language.
Automated tests can go even further. Tests can be built to verify that right-to-left fields operate correctly, or that international plurals are supported without having to rework the basis of the product’s code.
Embrace the global opportunity
Developing software in such a way that it can be ready for a global release is not as difficult as it may seem, or as it once might have been. By following these simple guidelines and doing a little bit of work up front, you can create software products that can later be made available to a much larger market than you might first address, making your product, and its usefulness, have far greater reach as a result.
2 thoughts on “How to write software that is ready for the world”