Tuesday, May 26, 2009

Uncovering Myths about Globalization testing- Input validation testing

This post is a continuation of my previous post on the same topic and is based on the real time myths about Globalization testing as i have experienced.
Myth 13- A tester can perform tests specific to text inputs for Localized applications using the similar approaches as the English language testing

The testing specific to Input field validation is an important form of testing in any Software application. An example of such testing can be suppose an application having a text field to input Credit card details and a tester can test the same by including various possible inputs to ensure that the valid data is being accepted by the application and the user is presented with a valid message indicating that the input is incorrect.

There are a several techniques that can be used to test this aspect of the application properly. Some available resources listed below-
Book: Lessons Learned in Software Testing Chapter-3 Testing Techniques Section- "How to create a Test Matrix for an Input field"

What is an encoding system ?:
Though the known techniques do talk about usage of various types of inputs including Language reserved characters e.g. the characters specific to any language that a Software application may be supported such as German, Japanese etc. as these languages do have their own writing systems and character sets. It is of utmost importance to test a Localized application with the language specific characters as any user in any of the product's supported countries would expect the application to support data processing in their own native language e.g. a Japanese user using an email client would expect the application to support writing emails in the Japanese language, otherwise the customer may not find the application worthwhile at all.
One of the important aspects specific to Localized data processing that the known techniques do not specifically talk about is the dependency the Localized data has on the underlying encoding system in the application. If you are new to the term- "encoding system", please read below mini description from www.unicode.org-
Unicode provides a unique number for every character, no matter what the platform,no matter what the program, no matter what the language.
Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers.

Does Unicode have different representations ?:
Unicode is actually an encoding system that encompasses virtually all the known character sets from different languages. There are several possible representations of Unicode data, including UTF-7, UTF-8, UTF-16 ,UTF-32 etc. Each of these different representations have its own advantages and disadvantages depending upon the context e.g.
UTF-8 is most common on the web. UTF-16 is used by Java and Windows. UTF-32 is used by various Unix systems.

Does encoding system representation affect the test data size ?:
One important fact to consider when testing the Input character set for an localized application is to know what type of encoding system is being used beneath. The reason why is it so important to know the underlying encoding system is that no. of bytes occupied for a certain character varies depending upon the encoding system used. Lets take a closer look at this statement by means of an example-
Take into consideration the following character from German language "ä". The byte count of this character depending upon the encoding system used is as follows-
UTF-16 Byte count for "ä"= 2
UTF-8 Byte count for "ä"= 2
UTF-7 Byte count for "ä"= 5

The above example shows that the encoding system do have an dependency on the no. of Bytes for a particular test character.

Different ways of Input text validation- No. of Bytes vs. No. of characters ?:
The next important factor before performing the Input validation testing in Localized applications is to know whether the validation logic is done as per no. of Bytes or the validation is done with no. of characters. Lets take a closer look at this statement by means of an example-
Suppose there is an application with a text field say Username. The usual assumption is that the validation will be done by no. of characters say the "Username" field will support maximum of 10 characters and a minimum of 3 characters.
Suppose a tester uses test data for "Username" as "ääääääääää" and the application is using encoding system as UTF-7. If the validation is done as per No. of characters, then the above is a valid test data as it represents 10 characters. In case the validation is done as per No. of Bytes, then it may not be a valid data (depending upon the Byte limit set), as the test data in the above example may amount to 50 bytes.

Thus, it is important to ascertain before you test to ensure the validation rules.

So, before you consider performing the Input validation testing or even generate test data for testing for Localized application ensure that you know about the following-
- Encoding system used by the application
- Validation rule- does the application validates the data as per Bytes or by no. of characters ?


Anonymous said...

Hi Anuj,

I am also a QA, working with some MNC in mumbai. For a long time i was searching for an article which describe about Internationalization concepts and i found your blog which is full of amazing articles which cleared so many myths which i had earlier abt I18N. Still i am reading your blog its too much interesting.

Nice Post......


Anuj Magazine said...

Hi Anonymous (i wish i knew your name),
Thanks for liking the articles on Globalisation testing. Please keep visiting this blog and provide your feedback (good or bad) as i am plan to share more experiences around this subject.