Data-Into w/ returned UTF-8 data from HTTP_req

Discussions relating to the ScottKlement.com port of the open source YAJL JSON Reader/Generator. This includes the YAJL tool as well as the YAJLR4, YAJLGEN, YAJLINTO and YAJLDTAGEN add-ons from ScottKlement.com. http://www.scottklement.com/yajl/
Post Reply
panders400
Posts: 9
Joined: Thu Jul 20, 2023 8:11 pm

Data-Into w/ returned UTF-8 data from HTTP_req

Post by panders400 »

I am using HTTP_req for API call. The data being returned hasn't been an issue previously, but recently some data was entered that had UTF-8 characters that do not convert to EBCDIC and it caused an error. Previously I hadn't had to do any UTF-8 encoding, but it was suggested that I do. I was able to, using example code of yours. get to return the data into a field that holds UTF-8. However, when I pass the data to data-into, it returns:
1002: lexical error: invalid character inside string.█
The document for the DATA-INTO operation does not match the RPG variable;

I have even tried making my entire data structure UTF-8 and it gives the same message.

Is there anyway to convert the data ignoring invalid characters?

This is the text that is causing the error: "DAOSHIAOKOU ROAD, SHENZE NEW AREA,PANAN COUNTY"

This is the basic code I am using. The GetVPODS is a very large, nested data structure.

Code: Select all

          
          dcl-s URL       varchar(1000);
          Dcl-S request   Varchar(2000000:4);
          dcl-ds *n;
             response Varchar(2000000:4);
             RespUTF  Varchar(2000000:4) ccsid(*UTF8) pos(1);
          end-ds;                                   
          	
           rc = http_xproc(HTTP_POINT_ADDL_HEADER:            
                      %paddr(add_Auth_Headers));                   
                                                     
            
           http_setOption('network-ccsid': '1208');             
                                                     
                                                     
  	rc = http_req( 'GET'                               
 	              : URL                                 
  	             : *omit                               
  	             : response                            
   	            : *omit                               
   	            : request                             
  	         : 'application/json; charset=utf-8');     
                          
             data-into(E) GetVPODS %DATA( response : DataIntoOptions)
                      %PARSER( 'YAJL/YAJLINTO':'{"value_null" : "-1"}');     
I attempted to attach a file of the data being passed to me, but it kept saying .txt was an invalid extension.
Scott Klement
Site Admin
Posts: 872
Joined: Sun Jul 04, 2021 5:12 am

Re: Data-Into w/ returned UTF-8 data from HTTP_req

Post by Scott Klement »

What's happening... HTTPAPI is downloading the data as UTF-8, but then is converting it to EBCDIC before returning it to your RPG program.

This will cause any character that doesn't exist in your particular EBCDIC encoding to be translated to x'3F' (this means "substitution character"). When you try to parse it, it will translate it back to UTF-8, however there's no way for it to know what the original character was before it was converted to x'3F', so it will be translated to the UTF-8 x'1A' (the Unicode substitute character.)

Unfortunately, JSON requires characters less than x'1F' to be escaped in the JSON document before parsing, so you get an error.

The "right" fix would be to NOT convert the data to EBCDIC. This is possible with HTTPAPI, but it's clumsy to code. HTTPAPI was designed for RPG, and EBCDIC is RPG's native character set.

Three possible alternatives you could use as solutions:
  • Use %scanrpl() to convert the x'3F' into the string '\x1a' (that's a 4 character string, NOT a single hex character x'1a') before passing the input to DATA-INTO.
  • Use a file rather than a variable. Have http_req() save the JSON to the IFS, and then have DATA-INTO read it from the IFS. (You can delete the file after you parse it.) For this to work correctly, you may need to also call http_setOption('file-ccsid': '1208');
  • Save to a variable the hard/clunky way... to do this, you'll have to set HTTP_SetOption('local-ccsid': 1208) to prevent HTTPAPI from translating the data, then have two copies of 'response' overlaying each other in a data structure, with one defined with ccsid(*utf8). Pass the one without the CCSID keyword to http_req(), and pass the one with the ccsid keyword to DATA-INTO. Note this also means that the data you send should be encoded the same way (though, as far as I can tell, you aren't sending any data, so this seems to be a non-issue in this case.)
panders400
Posts: 9
Joined: Thu Jul 20, 2023 8:11 pm

Re: Data-Into w/ returned UTF-8 data from HTTP_req

Post by panders400 »

Thank you, Scott. I will try your suggestions.
Post Reply