6706: Untranslatable Characters in Teradata

Question

I have following 2 strings consisting of Arabic characters, such strings fails' my mloads on daily basis:

شركة جيبكا السعودية
ح717695

Both columns in which these 2 strings are inserted are defined as Unicode. And in case of these string translate_chk does not work either.

Following is the select statement I am using:

select Translate_Chk ( 'شركة جيبكا السعودية' using LATIN_TO_UNICODE WITH ERROR)

Moreover, I wrote following code for a finnish client and it worked fine. But it does not work in case of Arabic characters.

SELECT (CASE WHEN Translate_Chk ( TRIM('شركة جيبكا السعودية') using LATIN_TO_UNICODE ) <> 0
THEN
        CASE WHEN (regexp_Instr('شركة جيبكا السعودية', 
        '\x{00}|\x{01}|\x{02}|\x{03}|\x{04}|\x{05}|\x{06}|\x{07}|\x{08}|\x{09}|\x{0B}|\x{0C}|\x{0E}|\x{0F}|\x{10}|\x{11}|\x{12}|\x{13}|\x{14}|\x{15}|\x{16}
        |\x{17}|\x{18}|\x{19}|\x{1A}|\x{1B}|\x{1C}|\x{1D}|\x{1E}|\x{1F}|\x{5E}|\x{60}|\x{7C}|\x{7E}|\x{7F}|\x{A0}|\x{A2}|\x{A6}|\x{A8}|\x{A9}|\x{AA}|\x{AB}|\x{AC}
        |\x{AD}|\x{AE}|\x{AF}|\x{B0}|\x{B1}|\x{B2}|\x{B3}|\x{B4}|\x{B5}|\x{B6}|\x{B7}|\x{B8}|\x{B9}|\x{BA}|\x{BB}|\x{BC}|\x{BD}|\x{BE}|\x{F0}',1,1,0,
            'c')) >0
            THEN
            ''
        when TD_SYSFNLIB.InSTR('شركة جيبكا السعودية', U&'\008A' UESCAPE '\')>0 
        THEN
        ''
    WHEN TD_SYSFNLIB.InSTR('شركة جيبكا السعودية', U&'\009A' UESCAPE '\')>0
    THEN
    ''
        WHEN TD_SYSFNLIB.InSTR('شركة جيبكا السعودية', U&'\008C' UESCAPE '\') >0
        THEN
        ''
        WHEN TD_SYSFNLIB.InSTR('شركة جيبكا السعودية', U&'\009C' UESCAPE '\')>0 
        THEN
        ''
        WHEN TD_SYSFNLIB.InSTR('شركة جيبكا السعودية', U&'\009F' UESCAPE '\')>0
        THEN
        ''
        END
ELSE
'String' END);

Teradata Verion is 16.20.53.31

How can we replace such strings with null? I donot care about data loss as currently I delete such records from the file and resume my mload.

What exactly are you trying to achieve? Both example strings are valid Unicode. But `LATIN_TO_UNICODE` forces a conversation of the literal to LATIN *before* translating it to Unicode again, of course, this fails for arabic characters. — dnoeth, May 30 '21 at 18:04
But Teradata refuses to accept these strings as valid Unicode. What, I am trying to acheive is that if Teradata fails to accept such strings as valid unicode characters then they can be replaced by a null. Thank you — Burhan Khalid Butt, May 30 '21 at 18:34
`select 'شركة جيبكا السعودية'` does not fail. Did you set the character set in the MLoad job to match your input file? Or switch on Unicode Passthrough. — dnoeth, May 30 '21 at 20:38

6706: Untranslatable Characters in Teradata

0 Answers0