QR Codes and Chinese characters
QR Codes are versatile containers for all sorts of data and have Kanji character encoding built into the specification, but what about simplified or traditional Chinese? This article attempts to explain the different ways it can be done and why the interpretation of a QR Code can be application specific…
One solution is just to encode all data as UTF-8, however this is not always the most compact method for Chinese characters.
There are 3 encoding systems that use pairs of bytes to represent a Chinese characters:
Kanji – built into the specification for the Chinese characters used with Japanese.
GBK – used for simplified Chinese in the People’s Republic of China.
Big5 – used for traditional Chinese in Taiwan, Hong Kong and Macau.
There is nothing within a QR Code that can tell the reader how the data was encoded, so the best it can do is to test how the data decodes in various formats and see which ones work. The problem with this is that GBK and Big5 use mostly the same values but for different characters.
To work around this problem we have now introduced the QRCodeByteMode property which directs the reader in the following way:
Value Meaning (in order of search)
0 UTF-8 or Binary.
1 Kanji, UTF-8, GBK, Big5 or Binary (default)
2 Kanji, UTF-8 or Binary
3 Big5, UTF-8 or Binary
4 GBK, UTF-8 or Binary
5 Kanji, UTF-8, Big5, GBK or Binary
6 Kanji only
7 Big5 only
8 GBK only