If you’re testing a product to be released to an international marketplace and you merely translated the text, then you’re only half done. An effective translation testing strategy should be coupled with globalization testing to test actual functionality of an application designed to support multiple languages. Here are some ways you can use test data for that crucial next step.
When a product needs to play in the global marketplace, a lot of research goes into the translation strategy to be employed. Methods of translation-related testing include testing translated terms for truncated text, proper translation of English text, and checking to be sure the correct terminology is used in context. Typically, this kind of testing is outsourced or subcontracted to local translators within the country who can verify the language in the product.
After the translation phase, you have a product with multicultural support, ready to take on the competitive market. But let’s hold that thought for a moment and expand further. Has any functional testing been done to the product to make sure it is truly ready to take on the global marketplace?
I want to explore the idea of incorporating an effective globalization testing process along with the now translation-verified product. Do the testers need to know different languages to be able to reliably test the product? Could we get the same team to test core functionality for Unicode, single-byte, or multibyte characters?
First, let’s review the three basic test data management steps: test data storage, test data manipulation, and test data validation.
Test Data Storage
First and foremost, it is important to understand the application you are testing. What are the different databases, data types, code pages, and encoding your application supports? What are the client-server side requirements for your application? This impacts test data creation because you have to be mindful to include all supported code pages and datatypes for testing with different character sets.
Each data type will be able to hold different lengths depending on how it was created, so it is imperative to understand the underlying database your application interacts with and the inherent data types it uses to store data for different languages.
For example, consider that your application supports an Oracle Unicode database and stores data in CHAR (max 2000 bytes) and VARCHAR (max 4000 bytes) columns. If the data size is not allocated properly, the database might reject the characters or result in data loss.
Test Data Manipulation
Now that you have identified the database support needed to store the test data, let’s take a closer look at how to put together and manipulate the data so you can set up a lens to help find the issues.
Using character sets known to cause issues
Each application has its own set of problematic characters, but there are some known character sets that have a high likelihood of causing issues.
In her book The Web Testing Companion, Lydia Ash illustrates how problematic characters could be used in the verification and testing of an application. Sample data can be classified according to the types of problems and whether they are comprised of single-byte or multibyte characters. Depending on how one uses this data in an application, these characters can help uncover potential multibyte issues.
You can also consider using supplementary character test data to test Unicode support. Supplementary characters use the maximum amount of memory, and insufficient support for them will most likely cause memory overruns, incorrect sorting, corrupt characters, or data loss.
These are just a few examples of using problematic characters as part of test data. Be mindful to incorporate all these characters—and more, if they’re relevant to your application—as part of your test data to get adequate coverage for testing.
Files for different character sets
If your application supports file attachments and the files are stored in LOB columns, consider creating or using files with support for Unicode or multibyte character sets (MBCS). Each language has its own character set you can use as data for testing. Having a combination of these data across the file types enables a thorough testing of all character sets a code page supports.
Here’s a trick you can use: Add “Start” and “End” identifiers at the beginning and end of a file. When these files are attached, manually verify that you see these identifiers where they should be. This test helps pinpoint any data loss and whether the entire file came through.
Test Data Validation
The goal of test data validation is to be able to detect functionality issues using Unicode characters or MBCS. That can be a challenge in a global product when you don’t really know the language. Here are some techniques you can use to verify this test data.
Manual testing to identify display issues
Some character display issues are visible as you test your application for support of multiple languages:
- Question marks appearing instead of displayed text indicate problems in Unicode-to-ANSI conversion
- Random ANSI characters (e.g., ¼, †, ‰, ‡, ¶) appearing instead of readable text indicate problems in ANSI code using the wrong code page
- The appearance of default glyphs such as boxes, vertical bars, or tildes [e.g., □, |, ~] indicates that the selected font cannot display some of the characters
Using a tool for table comparison
A big challenge of testing multibyte characters is identifying truncated characters or verifying large data. One possible solution is backend testing. Consider using a database comparison tool that supports Unicode and multibyte characters.
The tool I use compares data in the tables (a source and destination table). Using a native select statement, insert the Unicode data into the source table. Assume that your application transfers the input data to a database—here, the target table. You can then compare the data in these tables and identify whether the data is identical between source and target tables across different data types, and do a LOB/BLOB/CLOB comparison to see if the characters in different file types intact or if there was some truncation or data loss.
This testing is powerful enough to identify issues with Unicode characters and MBCS without having to know or understand the language.
Using an automation tool
With continuous integration and faster releases being the goal for companies using an agile methodology, automation is key to keep up with the pace. Using an automation tool for globalization testing can be challenging, but if implemented the right way, it’s very beneficial.
If there is a scenario already automated for data from a single-byte character set, then add the actual and expected results for different character sets for the same test and then run the test for each supported code page. Of course, this requires manual verification of the results, at least for the first run, to make sure your automation tool is able to identify different character sets in your application. Once you have it set up, you can be proud that your automation is now ready for globalization.
Globalization Testing: The Key to a Product’s Success
Test data management is key for a successful globalization testing process. The testing team should feel empowered to be able to put together test data and, more importantly, use such data strategically in order to produce consistent results.
If you’re testing a product that’s about to be released to an international marketplace and you merely translated the text, then you’re only half done. An effective translation testing strategy should be coupled with globalization testing to test actual functionality of an application designed to support multiple languages.
User Comments
A through coverage of Localization pitfalls
Insightful post. Your readers might also find real user reviews for all the major test data management tools on IT Central Station to be helpful.
As an example, this user writes in her review of CA Test Data Manager, "It's allowed us to be more focused on our efforts, it's allowed us to be faster. We run an agile shop, and so we've been able to use that data in our pipeline as we deliver stuff in a CICD world."