[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Character conversion



Hi again,

On second thought, may I meekly propose a little change to the CCSIDxlate() procedure, so that the return code from the call to the iconv function is captured, interpreted and the outcome reported back to the caller. This would at least allow the caller to know that the returned data may not be correct and an alternative be investigated - like using http_xlatedyn() in stead of http_xlate(). As it is now, the conversion is always reported as normal, even though it clearly isn't always the case.

Best regards,
Kaj


-----Original Message-----
From: ftpapi-bounces@xxxxxxxxxxxxxxxxxxxxxx [mailto:ftpapi-bounces@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Kaj Julius
Sent: Tuesday, November 16, 2010 4:38 PM
To: HTTPAPI and FTPAPI Projects
Subject: RE: Character conversion

Hi Scott,

As always, (well, almost) you are ahead of the crowd. :-)

I read one of your posts to Dennis Lovelady, where you mentioned a procedure named http_xlatedyn(), one I sadly had thoroughly overlooked. I have just tested this procedure with a slightly altered test program:

H DFTACTGRP(*NO) ACTGRP(*NEW) BNDDIR('HTTPAPI') DEBUG(*YES)             
                                                                        
D/copy libhttp/qrpglesrc,httpapi_h                                      
D Buffer          S             40A                                     
D size            S             10I 0                                   
D rc              S             10I 0                                   
D ReturnBufferptr...                                                    
D                 S               *                                     
D pReturnBuffer   S               *                                     
D ReturnBuffer    S            160A   Based(pReturnBuffer)              
D ConvertedLength...                                                    
D                 S             10I 0                                   
 /Free                                                                  
                                                                        
  http_debug(*ON); // Use default debug file...                         
                                                                        
  // Use UTF-8 character set - set up conversion from EBCDIC...         
                                                                        
  rc = http_SetCCSIDs(1208 : 277);                                      
  buffer = 'XYZÆØÅxyzæøå';                                              
  size = 20;                                                            
  dump;                                                                 
  rc = http_xlatedyn(size : %Addr(buffer) : TO_ASCII : ReturnBufferptr);
  if rc = -1;                       
    // Error handling...            
  else;                             
    ConvertedLength = rc;           
    pReturnBuffer = ReturnBufferptr;
  endif;                            
  dump;                             
  *inlr = *on;                      

It seems this newer procedure can indeed produce correct results. I will change my programs to use this procedure in stead of the old http_xlate()

Anyway, thanks for listening to me. :-)

Best regards,
Kaj


-----Original Message-----
From: ftpapi-bounces@xxxxxxxxxxxxxxxxxxxxxx [mailto:ftpapi-bounces@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Scott Klement
Sent: Monday, November 15, 2010 6:35 PM
To: HTTPAPI and FTPAPI Projects
Subject: Re: Character conversion

Hello,

HTTPAPI uses several different translation routines.  HTTP_xlate(),
HTTP_xlatep() and HTTP_xlatedyn().  There are pros and cons to each routine...

Not sure exactly which part of HTTPAPI you're having trouble with, or whether you're calling http_xlate() directly?

Can you provide more information?  Preferably some sample code that I can use to reproduce the problem?


On 11/15/2010 3:32 AM, Kaj Julius wrote:
>
>     Hi all,
>
>
>     I think I'm going crazy, so please help if you can. Somewhere I must
>     have taken a wrong turn. When I started out it seemed so easy, but
>     apparently not!
>
>
>     I'm trying to use http_SetCCSIDs(1208:277) and http_xlate() to
>     translate the special Danish characters æøåÆØÅ into their UTF-8
>     counterparts. The UTF-8 character set is used a lot in web
>     development, so I'm baffled at my findings.
>
>
>     CCSID 277 -->  1208
>
>
>     In CCSID 277 (Danish/Norwegian) the string 'æøåÆØÅ' is represented by
>     X' 7B7C5BC06AD0'
>
>
>     In CCSID 1208 (UTF-8) the same string is represented by
>     X'C386C398C385C3A6C3B8C3A5'
>
>     (notice that each character needs two bytes -- UTF-8 characters will
>     be using anywhere between one and four bytes)
>
>
>     [1]http://www.utf8-chartable.de/
>
>     [2]http://czyborra.com/utf/#UTF-8
>
>
>     What I get after running http_xlate (iconv translation), however, is
>     X' C6D8C5E6F8E5'
>
>
>     The procedures are executed without any apparent errors issued.
>     However, when I look at the converted data, it's all wrong. As far as
>     I can deduce what is in the buffer after the call equals the rightmost
>     byte of the two byte UTF-16 character set.
>
>
>     Incidentally, I get the exact same result when I specify CCSID 1200
>     (UTF-16) as the target. There I would have expected a returned buffer
>     of double the length of the input, since every character now use 16
>     bits (hence the name, I guess). This is just wrong!
>     It should have been X'00C600D800C500E600F800E5'
>
>
>     Is this behaviour normal for iconv conversions?
>
>
>     I'm on an old system, V5R3M0, is this at the root of the problem?
>
>
>
>     Okay, second problem:
>
>
>     Looking at the code in procedure CCSIDxlate() I notice that the code
>     doesn't allow for the output buffer to be of a different length than
>     the input buffer (which can be the case in conversions to/from
>     single-byte CCSIDs and mixed-length UTF-8 and definitely will be the
>     case in conversions to/from single-byte CCSIDs and UTF-16 or other
>     double-byte CCSIDs. The same buffer is used for both input and output
>     -- and the length of the converted characters isn't communicated back
>     to the caller.
>
>
>     Assuming that the above mentioned problem with the iconv conversion
>     isn't the norm (ie. is a problem on my system), shouldn't the
>     CCSIDxlate() procedure have used separate input and output buffers and
>     have returned the length of the converted characters in the buffer?
>
>
>     I'm using HTTPAPI 1.24beta11 from 2010-09-09
>
>
>     I look forward to your input in eager anticipation!
>
>
>
>     TIA
>
>
>     Kaj
>
> References
>
>     1. http://www.utf8-chartable.de/
>     2. http://czyborra.com/utf/#UTF-8
>
>
>
>
> -----------------------------------------------------------------------
> This is the FTPAPI mailing list.  To unsubscribe, please go to:
> http://www.scottklement.com/mailman/listinfo/ftpapi
> -----------------------------------------------------------------------

-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------