[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode problem



Scott,

In my opinion the file that John posted must be UTF-16 (LE) encoded.

Here are the first 8 bytes of the file:

   FF FE 3C 00 3F 00 78 00   <?x

The first 2 BOM bytes clearly indicate UTF-16 (LE) because they match the
UTF-16 (LE) signature given at:

   http://en.wikipedia.org/wiki/Byte-order_mark

   UTF-8                - EF BB BF (actually not a BOM but a UTF-8
signature)
   UTF-16 big endian    - FE FF
   UTF-16 little endian - FF FE

Furthermore I used Notepad2 (Freeware) to convert the file into the
following encodings:

   UTF-8 without BOM (signature)

      3C 3F 78 6D 6C 20 76 65   <?xml ve

   UTF-8 with BOM (signature)

      EF BB BF 3C 3F 78 6D 6C   <?xml

   URF-16 big endian

      FE FF 00 3C 00 3F 00 78   <?x

   UTF-16 little endian

      FF FE 3C 00 3F 00 78 00   <?x

As you can see John's file matches the UTF-16 (LE) encoding. The same
thing is true when I convert John's file to UCS-2 (LE) (Notepad++,
Freeware) which is the same as UTF-16 (LE):

   UCS-2 little endian:

      FF FE 3C 00 3F 00 78 00   <?x

Last but not least the "Unicode" and "Unicode big endian" options of the
original Microsoft Notepad match "UTF-16 (LE)" and "UTF-16 (BE)" as
expected.

So in my opinion there is no doubt about that John's file is UTF-16 (LE).
I am as confused as Scott and I cannot believe that John could parse the
file without having it converted to UTF-8 before. I attached all encodings
of the file for everyone who is interested.

John, is it possible that the file originally was UTF-8 and that you
accidentally converted it to UTF-16 with Notepad or when transferring it
to the iSeries?

Thomas.



ftpapi-bounces@xxxxxxxxxxxxxxxxxxxxxx schrieb am 18.03.2010 18:12:08:

> Von:
>
> sk@xxxxxxxxxxxxxxxx
>
> An:
>
> ftpapi@xxxxxxxxxxxxxxxxxxxxxx
>
> Datum:
>
> 18.03.2010 18:23
>
> Betreff:
>
> Re: Unicode problem
>
> Gesendet von:
>
> ftpapi-bounces@xxxxxxxxxxxxxxxxxxxxxx
>
> Hello,
>
> > I just found out that the file is UTF-8.  If this is the case, what do
I
> > need to change so we can process this file?
>
> The XML file that you sent me on March 15th is UTF-16 (not UTF-8!!)
>
> It didn't work with CCSID 1200 specified because it had a byte-order
> mark. I should probably add BOM support to HTTPAPI, since for some odd
> reason the iconv() API doesn't understand BOM's
>
> However, if I let Expat do the character set translation (using
> HTTP_XML_CALC) it worked flawlessly for me.
>
> So I'm a little baffled by your message saying that it's UTF-8.  Anyone
> familiar with Unicode can see at a glance that this isn't UTF-8 --
> unless the format has been changed since you sent that file?
> -----------------------------------------------------------------------
> This is the FTPAPI mailing list.  To unsubscribe, please go to:
> http://www.scottklement.com/mailman/listinfo/ftpapi
> -----------------------------------------------------------------------
>


--
IMPORTANT NOTICE:
This email is confidential, may be legally privileged, and is for the
intended recipient only. Access, disclosure, copying, distribution, or
reliance on any of it by anyone else is prohibited and may be a criminal
offence. Please delete if obtained in error and email confirmation to the sender.

Attachment: freight_201003101542591443916.zip
Description: Zip archive

-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------