Auto-detection of UTF8 data now available for PDF-417 and DataMatrix
Version 8.1.1.7 of the Softek Barcode Reader Toolkit for Windows has been released. This version addresses an issue with handling UTF-8 encoded data in a PDF-417 or DataMatrix bar code.
The problem…
Say you have a PDF417 bar code that encodes the characters:
キヤノン電子
….in UTF-8 format with the following bytes of hex data:
E3 82 AD E3 83 A4 E3 83 8E E3 83 B3 E9 9B BB E5 AD 90
In previous versions of the SDK this would display as:
=E3=82=AD=E3=83=A4=E3=83=8E=E3=83=B3=E9=9B=BB=E5=AD=90
…with Encoding set to 1 (quoted printable), which is correct.
But with Encoding set to 3 (UTF-8) the data is double encoded and displays as:
ãã¤ãã
…and contains the following bytes of hex data:
C3 A3 C2 82 C2 AD C3 A3 C2 83 C2 A4 C3 A3 C2 83 C2 8E C3 A3 C2 83 C2 B3 C3 A9 C2 9B C2 BB C3 A5 C2 AD C2 90
Version 8.1.1.7 checks the binary data (with reference to the property Pdf417AutoUTF8) to see if it is already encoded as UTF8 and if so does not encode it again.
Note that it is quite possible that binary data in a PDF417 bar code could look like UTF-8 encoded data but the requirement is to allow it to pass through. In this case the Pdf417AutoUTF8 setting should be disabled.
The same applies to DataMatrix bar codes and the property called DataMatrixAutoUTF8