chinese language and poi36

Scott Klement's open source interface to the POI HSSF/XSSF Spreadsheet Project for RPG Programmers. http://www.scottklement.com/poi/
Post Reply
lucius
Posts: 5
Joined: Wed Dec 22, 2021 8:39 am

chinese language and poi36

Post by lucius »

Hi Scott
I have to import an excel spreadsheet with some chinese text

I have an Iseries with a chinese partition (double byte) where I normally handle chinese characters

for the first time I should import in my DB an excel with some chinese text.

The latin characters are ok, but the chinese text is not correct.. I get only ??????? into the field

This is what I'm using to read char from excel:
cell = SSRow_GetCell(row: 3);
desartch = String_getBytes(SSCell_getStringCellValue(cell)

Do you have any idea??

thank you very much
Lucio
Italy
Scott Klement
Site Admin
Posts: 635
Joined: Sun Jul 04, 2021 5:12 am

Re: chinese language and poi36

Post by Scott Klement »

How is String_getBytes() defined in your program?

What is the job CCSID? Does it support the characters you are trying to receive?
lucius
Posts: 5
Joined: Wed Dec 22, 2021 8:39 am

Re: chinese language and poi36

Post by lucius »

here:

D String_getBytes...
D pr 1024A varying
D extproc(*JAVA:
D 'java.lang.String':
D 'getBytes')



cell = SSRow_GetCell(row: 9);
projdescr = String_getBytes(SSCell_getStringCellValue(cell));

from the PF definition:
A PROJDESCR 60O CCSID(937)


from the job:
Language identifier . . . . . . . . . . . . . . . : ENU
Country or region identifier . . . . . . . . . . : US
Coded character set identifier . . . . . . . . . : 65535
Default coded character set identifier . . . . . : 37

thank you!
Scott Klement
Site Admin
Posts: 635
Joined: Sun Jul 04, 2021 5:12 am

Re: chinese language and poi36

Post by Scott Klement »

Your prototype for String_getBytes tells RPG to convert from the Java string format (which is in Unicode) to RPG data type A, which is EBCDIC.

Your job ccsid is 65535 -- which means "hex" or otherwise known as "do not translate this data". So the computer is saying to itself "Lucio asked me to translate to EBCDIC, but his EBCDIC CCSID is 'hex', which doesn't make sense. So I will translate to the default ccsid 37 instead.'

CCSID 37 does not contain Chinese characters, so it must replace them all with x'3F' (which will appear as ? on some displays.)\

If you want it to read Chinese successfully, you need to use character sets that contain chinese characters!
lucius
Posts: 5
Joined: Wed Dec 22, 2021 8:39 am

Re: chinese language and poi36

Post by lucius »

Thank you Scott

I also changed the CCSID of the job to
Coded character set identifier . . . . . . . . . : 937
Default coded character set identifier . . . . . : 937
but it does not work
I still get the ?

In you opinion it is not related the way Java reads the data but to the different combination of CCSID of original excel file and ccsid of the client?
thank you
ciao
Lucio
Scott Klement
Site Admin
Posts: 635
Joined: Sun Jul 04, 2021 5:12 am

Re: chinese language and poi36

Post by Scott Klement »

The data in Java is in Unicode. When RPG calls a Java routine like String_getBytes() it will convert that data to EBCDIC. I thought it always converted to the job ccsid -- so if your job CCSID is set correctly, it should convert correctly. It sounds like you're telling me that it is not.

Perhaps you should forget about converting it to EBCDIC and convert it to Unicode instead.

To do that, instead of String_getBytes use a call like this:

Code: Select all

D String_getUCS2  pr         16383C   varying            
D                                     extproc(*JAVA:     
D                                     'java.lang.String':
D                                     'toCharArray')     
Then when you need to get the data, use that instead of String_getBytes...

Code: Select all

myUnicodeVar = String_getUCS2(SSCell_getStringCellValue(cell));
Since the output is unicode, it should support anything.
lucius
Posts: 5
Joined: Wed Dec 22, 2021 8:39 am

Re: chinese language and poi36

Post by lucius »

Thank you Scott
we are not far from the right result!

Now I got some chinese char but it seems that there is a sort of "translation" between standard chinese char and traditional chinese char.
At this point maybe is the excel file which has been created with a different chinese char set. (?)

Chinese is a difficult matter to treat...

Thank you

ciao

Lucio
Scott Klement
Site Admin
Posts: 635
Joined: Sun Jul 04, 2021 5:12 am

Re: chinese language and poi36

Post by Scott Klement »

Sorry, I'm not sure what would cause that.

If possible, I would use a Java debugger (RDi can debug Java) to see what values POI is getting out of the spreadsheet before it returns them to RPG. This should tell you where the problem is occurring.
lucius
Posts: 5
Joined: Wed Dec 22, 2021 8:39 am

Re: chinese language and poi36

Post by lucius »

ciao Scott
I'm back again....

I have the same problem of chinese chars in WRITING out an excel spreadsheet...

I have a file in AS400 with the correct chinese chars.. I can see them in a subfile or sql select

But when I create the xlsx they become strange chars... not for sure chinese chars!

Do you have any suggestions?

thank you very very much for you precious help..

ciao
Lucio
Post Reply