Large TIFF data via HTTP web service

Discussions related to HTTPAPI (An HTTP Client Package for RPG programming.) http://www.scottklement.com/httpapi/
Post Reply
thomasmca
Posts: 2
Joined: Fri Aug 25, 2023 8:18 pm

Large TIFF data via HTTP web service

Post by thomasmca »

I am using version 1.45 of HTTPAPI. I need to retrieve large TIFF files via an HTTP web service. The retrieved XML contains a BASE64 encoded TIFF file. I can retrieve the file fine, but the decoded data no longer has TIFF headers, so it is not a valid TIFF image.

My code uses HTTP_DEBUG, but it never generates a log. I have tried both with and without the 2nd parm.

Code: Select all

http_debug(*on: '/tmp/httpapi_debug.txt' );
Because the downloaded files could be over 100MB, I save the retrieved XML to the IFS. That part works fine:

Code: Select all

// post the XML, and put the return value                     
// into a temp file in the IFS                                
rc = http_req( 'POST'                                         
             : URL                                            
             : %trim(tmpXMLpath)          // File to receive  
             : *omit                      // String to receive
             : *omit                      // File to send     
             : SOAP );                    // String to send   
I pass the XML to a handler, which also works fine:

Code: Select all

// call our XML Handler proc               
xml-sax(e) %handler( ONBHandler: XMLInfo ) 
   %xml( %trim(tmpXMLpath): 'doc=file' );  
My handler checks the event codes, and correctly retrieves the payload:

Code: Select all

// We got the requested XML element, so process it 
when xEvent = *XML_CHARS;    
x@ is a pointer to the XML payload (meaning, the raw, encoded TIFF data, without any XML wrappers):

Code: Select all

if write(XMLInfo.IFSHandleRaw: x@: x@Length) < 0;             
   // handle error
endif;                                                         
                                                               
// Close and reopen the file                                   
if close(XMLInfo.IFSHandleRaw) < 0;                           
endif;                                                         
                                                               
XMLInfo.IFSHandleRaw = Open( XMLInfo.rawPath:          
                              O_TextData + O_ReadOnly ); 
// Create a heap space that points to the file

Code: Select all

CRTHS( Me.StreamHeapId: *loval );                         
Stream@ = ALCHSS( Me.StreamHeapId: FileStatus.Size );     
Me.StreamSize = Read( XMLInfo.IFSHandleRaw:              
                      Stream@: FileStatus.Size );            
The handler works as expected, and retrieves encoded TIFF data. Here is the beginning of one download:

Code: Select all

SUkqAAgAAAASAP4ABAABAAAAAAAAAAABBAABAAAAwAYAAAEBBAABAAAASAkAAAIBAwABAAAAAQAAAAMB
If I copy the entire value into https://codebeautify.org/base64-to-image-converter, it correctly generates the TIFF image.

Next, I decode the data:

Code: Select all

Me.DecLen = base64_decode( Stream@     
                        : Me.StreamSize
                        : outBuf
                        : Me.StreamSize ); 
Then I save the decoded data:

Code: Select all

// Write decoded value to IFS                         
if write( XMLInfo.IFSHandle: outBuf: Me.DecLen ) < 0; 
   // handle error                  
endif;                                                
Close_Stream( XMLInfo.IFSHandle );                    
My problem is the resulting file is not a valid TIFF file. Since I can paste the payload into a BASE64 converter, I know that my http_req() works, and so does the XML handler. To me, that only leaves a CCSID clash as the cause of my problem. Something is not getting the CCSID that it expects, which corrupts the results. I have tried a ton of CCSID options and variations, but nothing I've tried creates a usable TIFF file.

The retrieved XML has this in the header: encoding="utf-8". Since I am saving an image to the IFS, I need the result to use CCSID 1252. After many hours of testing, I learned that RPG does CCSID conversion automatically, so I may not need to use iconv(). But since RPG's auto-conversion requires a CCSID property on a field, I might still need to use iconv() because I'm using pointers.

I've tried setting the CCSID of the raw work file in /tmp to 1208, which didn't help. I've also tried

Code: Select all

HTTP_SetFileCCSID( 1208 );
which didn't help, either.

Here is my iconv() proc (which also doesn't create a usable TIFF file):

Code: Select all

decBuf = RW_iConv( outBuf: Me.DecLen: 1252 ); 

Code: Select all

  //====================================================================
  //  RW_iConv:  convert CCSID's                                        
  //====================================================================
P RW_iConv        B                   export                            
D RW_iConv        PI              *                                     
D   pInputPtr                     *   const                             
D   pInputLen                   10i 0 const                             
D   pToCCSID                     5  0 const                             
                                                                        
D local           DS                  likeds(QTQCODE_t) inz( *likeds )  
D remote          DS                  likeds(QTQCODE_t) inz( *likeds )  
                                                                        
D toLocal         DS                  likeds(iConv_t)                   
D toRemote        DS                  likeds(iConv_t)                   
                                                                        
D inpPtr          S               *                                     
D inpPtrP         S               *                                     
D inpLen          S             10U 0                                   
D inpPtr64        S             64a   based( inpPtr )                   
D outPtr          S               *                                     
D outLen          S             10U 0                                   
D outPtr64        S             64a   based( outPtr )                   
                                                                        
D inLeft          S             10U 0                                   
D outLeft         S             10U 0                       
                                                            
D rc               S             10U 0                       
                                                                                                                        
   local.CCSID     = *zero;     // I've also tried 1208
   remote.CCSID    = pToCCSID;                              
                                                            
   outPtr   = %alloc( pInputLen );                          
   inpPtr   = pInputPtr;                                    
                                                            
   toRemote = *ALLx'00';                                    
   toRemote = QtqIconvOpen( remote: local  );               
   if toRemote.return_value = -1;                           
     // handle error                                        
   endif;                                                   
                                                            
   inLeft   = pInputLen;                                    
   outLeft  = pInputLen;                                    
                                                            
   rc = iconv( toRemote: inpPtr: inLeft: outPtr: outLeft ); 
                                                            
   if rc = ICONV_FAIL;                                      
     // handle error     
   endif;                
                         
   iconv_close(toRemote);
                         
   return outPtr;        
                         
P RW_iConv        E      

What am I missing?
Scott Klement
Site Admin
Posts: 658
Joined: Sun Jul 04, 2021 5:12 am

Re: Large TIFF data via HTTP web service

Post by Scott Klement »

thomasmca wrote: Mon Aug 28, 2023 12:07 pm My code uses HTTP_DEBUG, but it never generates a log. I have tried both with and without the 2nd parm.

Code: Select all

http_debug(*on: '/tmp/httpapi_debug.txt' );
Hmmm... have never heard of that not working. But, it seems irrelevant to your question, as you are telling us that HTTPAPI is working correctly for you, anyway...
My handler checks the event codes, and correctly retrieves the payload:

Code: Select all

// We got the requested XML element, so process it 
when xEvent = *XML_CHARS;    
I'm very skeptical of this. XML_CHARS is receiving the 100 mb or larger string in a single call? This seems very unlikely to be true. But, all I know about it is that you say it's correct... I guess I have to take your word for it.
x@ is a pointer to the XML payload (meaning, the raw, encoded TIFF data, without any XML wrappers):

Code: Select all

if write(XMLInfo.IFSHandleRaw: x@: x@Length) < 0;             
   // handle error
endif;
                                                               
// Close and reopen the file                                   
if close(XMLInfo.IFSHandleRaw) < 0;                           
endif;
If you already have the entire raw, encoded, data in memory -- why on earth are you writing it to the IFS?

Code: Select all

XMLInfo.IFSHandleRaw = Open( XMLInfo.rawPath:          
                              O_TextData + O_ReadOnly ); 
Why are you re-opening the file that you just closed? What's the point of that?

Also, why are you using code pages instead of CCSIDs, here? Does this need to be compatible with V4R5 or older? I mean... it's 2023 for crying out loud.
// Create a heap space that points to the file

Code: Select all

CRTHS( Me.StreamHeapId: *loval );                         
Stream@ = ALCHSS( Me.StreamHeapId: FileStatus.Size );     
Me.StreamSize = Read( XMLInfo.IFSHandleRaw:              
                      Stream@: FileStatus.Size );            
I'm not familiar with CRTHS or ALCHSS. Appears to be an overly complicated way of allocating memory? Why are you doing that instead of using the normal memory allocation routines? Why are you allocating memory at all if you're working with IFS files and reading/writing multiple times, why not just work with a smaller buffer at a time? Or if you do need the whole thing at memory at once, since it's over 100mb, you'd want to use teraspace allocations... but why would you bother with the IFS file in that case? This whole thing makes no sense.
The handler works as expected, and retrieves encoded TIFF data. Here is the beginning of one download:

Code: Select all

SUkqAAgAAAASAP4ABAABAAAAAAAAAAABBAABAAAAwAYAAAEBBAABAAAASAkAAAIBAwABAAAAAQAAAAMB
If I copy the entire value into https://codebeautify.org/base64-to-image-converter, it correctly generates the TIFF image.
That's good. Where are you copy/pasting it from? surely you're not opening it in the debugger and copy/pasting the entire 100mb+ variable, are you? If you're copy/pasting the IFS file, then you don't really know that your attempt to read it into allocated memory is working correctly, do you?
Next, I decode the data:

Code: Select all

Me.DecLen = base64_decode( Stream@     
                        : Me.StreamSize
                        : outBuf
                        : Me.StreamSize ); 
I don't see anything obvious wrong here -- but I also don't really know what Stream@ or outBuf are. Assuming they are pointers, then how was the memory allocated? Are they pointing to the right spots within that allocation? Is Me.StreamSize correctly set to the length of those allocations?
Then I save the decoded data:

Code: Select all

// Write decoded value to IFS                         
if write( XMLInfo.IFSHandle: outBuf: Me.DecLen ) < 0; 
   // handle error                  
endif;                                                
Close_Stream( XMLInfo.IFSHandle );                    
Here you are writing binary data -- but you've provided no information whatsoever about how you're opening the file or making sure it's treated as binary data... this is very important for us to understand given the symptoms. If you aren't correctly writing the data for the TIFF, it clearly isn't going to work.
My problem is the resulting file is not a valid TIFF file. Since I can paste the payload into a BASE64 converter, I know that my http_req() works, and so does the XML handler. To me, that only leaves a CCSID clash as the cause of my problem. Something is not getting the CCSID that it expects, which corrupts the results. I have tried a ton of CCSID options and variations, but nothing I've tried creates a usable TIFF file.
NO! Absolutely not. TIFF data is binary data!! CCSIDs describe how text is encoded... but TIFF files are NOT text, they are binary data.
If you are treating things as text, that would definitely explain why you're having problems. (Which is also what I was asking above abou tthe IFS file stuff.)
The retrieved XML has this in the header: encoding="utf-8". Since I am saving an image to the IFS, I need the result to use CCSID 1252. After many hours of testing, I learned that RPG does CCSID conversion automatically, so I may not need to use iconv(). But since RPG's auto-conversion requires a CCSID property on a field, I might still need to use iconv() because I'm using pointers.
Err... so the XML file says it's UTF-8. That means you MUST encode the file as Windows-1252? HUH?! Where did you come up with that logic? The XML file should be encoded as CCSID 1208. (UTF-8). But... of course you couldn't do that if you wanted to because you're using code pages rather than CCSIDs... do you see where I'm going with this?
I've tried setting the CCSID of the raw work file in /tmp to 1208, which didn't help. I've also tried

Code: Select all

HTTP_SetFileCCSID( 1208 );
which didn't help, either.[/code]

All of the characters in a base64 encoded document will have the same code points in UTF-8 as they do in iso-8859-1 (CCSID 819). Windows-1252 is mostly based on iso-8859-1 (with only some modifications). However there are many other things in XML besides base64-encoded data, and some of it may or may not work when you incorrectly label something as 1252 instead of 1208. If you want to do this right, use 1208, there's no down side to it. 1252 is mostly only still around for legacy purposes anyway, unless you're still running Windows 95 somewhere.

I'm removing all the iconv() stuff from the reply as it makes ZERO sense to call iconv() on binary data... you're essentially sabotaging things by using it.

The output of base64_decode nedes to be written AS-IS with absolutely no translation. If you do that and its still not working, then something you're doing is causing data to be lost... perhaps the weird allocations you are using don't work as expected (I wouldn't know) or perhaps you are using a pointer or a length wrong. But the first thing to do is eliminate all CCSID logic after the data is decoded... it is binary, and must be treated as binary.
thomasmca
Posts: 2
Joined: Fri Aug 25, 2023 8:18 pm

Re: Large TIFF data via HTTP web service

Post by thomasmca »

Thanks for the reply, and the detailed responses.

I initially saved the retrieved XML to the IFS for testing purposes. If I remove the XML wrappers to get just the payload, pasting that data into an online base64-to-image converter correctly creates an image.

Later, when I thought my problem was CCSID related, I manipulated the CCSID of that IFS file and re-read the data, which would allow auto-translation of the CCSID. I closed and re-opened the file because the read() API (to get the size of the data) didn't work unless I did that.

Here is my handler:

Code: Select all

P ONBHandler      B                   export         
D ONBHandler      PI            10i 0                
D   xCommArea                         LIKEDS(XMLInfo)
D   xEvent                      10I 0 value          
D   x@                            *   value          
D   x@Length                    20I 0 value          
D   xExcepID                    10I 0 value          
And here is where the handler processes the XML:

Code: Select all

// We got the requested XML element, so process it
when xEvent = *XML_CHARS                          
 and xCommArea.haveElem = *ON                     
 and x@Length > 0;                                

// Calculate decoded size                       
Me.rem = %rem( x@Length * 3: 4 );               
if Me.rem = *zero;                              
   Me.decSize = x@Length * 3 / 4;               
else;                                           
   Me.decSize = (x@Length * 3 / 4) + 4 - Me.rem;
endif;                                          

// Decode                         
Me.DecLen = base64_decode( x@     
                        : x@Length
                        : outBuf       
                        : Me.decSize );

// Adjust the output buffer to the actual size 
// vvvvvvvvvvv this line of code "broke" my images
// outBuf = %alloc(Me.decLen);                    

// Once I removed that line, my images work.

if write( XMLInfo.IFSHandle: outBuf: Me.DecLen ) < 0; 
   // handle error         
endif;                                                
Thanks for pointing me in the right direction. Obviously, I need to read up on the difference between heap space and teraspace.
Post Reply