Importing Latin-1 (ISO 8859-1) text into UTF-8 Oracle Database varchar2 field

Question

We have an Oracle database, which Oracle E-Business Suite (R12.1.3) runs against.

I checked how to find the character set of the database, via this SO answer: How to view Oracle table column character set?

This SQL:

SELECT *
  FROM v$nls_parameters
 WHERE parameter LIKE '%CHARACTERSET'

Returned:

PARAMETER           VALUE
----------------------------------------
NLS_CHARACTERSET    UTF8
NLS_N_CHARACTERSET  UTF8

There's an interface from an external system, which sends text files containing AP invoice headers and lines.

The data sent from the external party is also UTF8.

However, the Description field of the AP_INVOICE_LINES_ALL table in Oracle E-Business Suite is a VARCHAR2 (240 bytes).

Since that's the case, the £ sign takes up 2 bytes, and we are having problems where sometimes the description from the external file is > 240 bytes, and when that happens, ~~the file is rejected~~ problems are caused with the invoice import process. It's not that the file is rejected, but due to the design of the interface, because the description exceeds 240 bytes, some of the invoice line data is not loaded.

The developers of the external system have confirmed that they cannot truncate their Invoice Long Description to a specific byte count because of the risk of splitting a character like the £ sign, thus resulting in ~~an invalid file~~ problems with the import process.

The external developer are proposing to move to a single byte per character encoding format such as Latin-1 (ISO 8859-1).

I realise the simple way to test this would be for them to send us a sample text file with this type of encoding, and for us to import it into the system to check what the contents look like.

That would take us some time to test.

As a quick test, I checked something in Notepad++ (I realise that's nothing to do with an Oracle database).

I saved a file in Notepad++, where the file was encoded as Latin-1 (ISO 8859-1), and typed in a £ sign:

Then I changed the encoding to UTF-8:

On doing so, the £ sign becomes unreadable.

I wondered if this is a reliable test, as in, would the same thing likely happen in Oracle?

I can't test in Oracle as I don't have access to create databases etc. I thought I could try some basic test with an installation of Oracle Database 11g Express Edition, but have seen that it's not possible to create more than the default XE database on such an installation.

We don't have the option to change any properties of the AP Invoice Lines All table, since it's a core table used in Oracle E-Business Suite. The only option we have is to make changes at the source of the process, from the external systems.

`the file is rejected.` - why the file is rejected ? Is there any error message ? — krokodilko, Nov 27 '17 at 16:27
Good question - sorry - I have revised my question to try to explain the issue more clearly — 4532066, Nov 27 '17 at 16:35
I would first try to change a [column length semantic](https://docs.oracle.com/database/121/REPLN/repunicode.htm#REPLN009) of the column from `VARCHAR2 (240 bytes)` to `VARCHAR2 (240 char)` on a test environment and would check if this helped. Just run `ALTER TABLE tablename MODIFY ( columnname VARCHAR(240 char))` leaving the encoding as is (utf8) and do a test. — krokodilko, Nov 27 '17 at 17:10
Thanks for your reply. I forgot to say that we don't have the option to change any properties of the AP Invoice Lines All table, since it's a core table used in Oracle E-Business Suite. The only option we have is to make changes at the source of the process, from the external systems. I have updated the question to include that info at the end. — 4532066, Nov 27 '17 at 18:07
Changing the database character set to ISO-8859-1 is definitely the worst idea. Changing the column from `VARCHAR2 (240 byte)` to `VARCHAR2 (240 char)` should not even be noticed by Oracle E-Business Suite. — Wernfried Domscheit, Nov 27 '17 at 20:03
It is important, that you set your `NLS_LANG` value according to encoding of the file, see https://stackoverflow.com/questions/33783902/odbcconnection-returning-chinese-characters-as/33790600#33790600 Anyway, this does not solve you problem with exceed of max. length. For this you have to change to `CHAR` length semantics. — Wernfried Domscheit, Nov 27 '17 at 20:09

Importing Latin-1 (ISO 8859-1) text into UTF-8 Oracle Database varchar2 field

0 Answers0