0

I was writing files using

l_file := utl_file.fopen('OUT', 'a.txt', 'w');
utl_file.put_line(l_file, 'Rosëttenville');

but I changed this to

l_file := utl_file.fopen_nchar('OUT', 'a.txt', 'w', 32767);
utl_file.put_line_nchar(l_file, 'Rosëttenville');

when I found out that the extended ASCII (characters above code 127) were not written out correctly. However the second unicode version also does not write the extended characters correctly. Instead of Rosëttenville I'm getting Rosëttenville. Anyone know how to fix this?

Superdooperhero
  • 7,584
  • 19
  • 83
  • 138

2 Answers2

5

You haven't said what your database character set is, and thus whether it's legitimate to have 'extended ascii' (probably 8859-1, with chr(235) in this case) in a string, or if this is just a demo. Either way, I think, your problem is trying to implicitly convert a non-unicode string.

ë is code point EB, which is also UTF-8 C3 AB. You're getting separate characters à (code point C3) and « (code point AB). So it can't do a direct translation from chr(235), which is 0x00EB, to U+00EB. It seems to be going via the UTF-8 C3 AB as two separate characters. I'm not going to try to understand exactly why...

You can either use the convert function:

l_file := utl_file.fopen('OUT', 'a.txt', 'w');
utl_file.put_line(l_file,
  convert('Rosëttenville', 'WE8ISO8859P1', 'UTF8'));

... or, as use of that is discourage by Oracle, the utl_raw.convert function:

l_file := utl_file.fopen('OUT', 'a.txt', 'w');
utl_file.put_line(l_file,
  utl_raw.cast_to_varchar2(utl_raw.convert(utl_raw.cast_to_raw('Rosëttenville'),
    'ENGLISH_UNITED KINGDOM.WE8ISO8859P1', 'ENGLISH_UNITED KINGDOM.UTF8')));

Both give me the value you want, and your original gave me the same value you saw (where my DB character set is AL32UTF8 in 11gR2 on Linux). If your DB character set is not Unicode, your national character set certainly appears to be (it isn't clear in the question if you got the same output with both attempts), so the nchar version should work instead:

l_file := utl_file.fopen_nchar('OUT', 'a.txt', 'w', 32767);
utl_file.put_line_nchar(l_file,
  utl_raw.cast_to_varchar2(utl_raw.convert(utl_raw.cast_to_raw('Rosëttenville'),
    'ENGLISH_UNITED KINGDOM.WE8ISO8859P1', 'ENGLISH_UNITED KINGDOM.UTF8')));

It would probably be better to be working with Unicode values in the first place, particularly if you currently have a mix of 'extended ascii' and other string types in a table; applying the conversion to everything in that case might give some odd results...

Alex Poole
  • 183,384
  • 11
  • 179
  • 318
  • Our database is on AMERICAN_AMERICA.AL32UTF8 – Superdooperhero Jun 11 '13 at 11:15
  • Not sure what I should use for the from and to in utl_raw.convert. – Superdooperhero Jun 11 '13 at 11:20
  • I'd preferrably like to output a plain ascii file, with utf8 characters that can be converted coming through as converted and others just defaulting to a space or something. So ë would be converted just fine in ascii. – Superdooperhero Jun 11 '13 at 11:22
  • @Superdooperhero - you can use `AMERICAN_AMERICA.WE8ISO8859P1` and `AMERICAN_AMERICA.UTF8` for the from and to then. `ë` is not an ASCII character, it's in the 'extended' sets like 8859-1. If you have it in a `varchar2` column then it's already in UTF-8, and working with a fixed string like this might not be entirely helpful. – Alex Poole Jun 11 '13 at 11:30
  • @Superdooperhero - I don't think you can stop Oracle creating the file as UTF-8. You can [convert it afterwards](http://stackoverflow.com/q/64860/266304) though, e.g. on Linux: `iconv -f UTF8 -t ISO-8859-1 a.txt > b.txt`. – Alex Poole Jun 11 '13 at 11:38
  • @AlexPoole Having had read as to why convert() is discouraged, I think that cast_to_varchar2() has exactly the same discouraged behavior. I guess utl_file.put_raw() would be the way to go. – kubanczyk Dec 13 '13 at 09:15
  • @kubanczyk - you might be right, if `put_raw()` gives you something usable in the file, but either seemed to work OK in this instance. I just pointed to the `utl_raw` version because of the doc comment. If you're dealing with the native DB character sets anyway - as the OP seems to be - it doesn't seem to make any difference really, and the extra `utl_raw` conversion just makes it a bit more explicit what's going on, and that care needs to be taken. If you're using a different character set in the `utl_raw.convert` call then the `cast_to_varchar2` might give you problems though, yes. – Alex Poole Dec 13 '13 at 09:52
0

UTL_FILE.PUT_LINE does not make conversion of data and export data in database default character set.

So You need to make proper conversion on write:

UTL_FILE.PUT_LINE(file,CONVERT(text,'WE8ISO8859P1'),FALSE);

You must set:

LANG=GERMAN_AUSTRIA.WE8ISO8859P1;export LANG
LC_CTYPE=ISO-8859-1;export LC_CTYPE
NLS_LANG=GERMAN_AUSTRIA.WE8ISO8859P1;export NLS_LANG
DejanR
  • 441
  • 4
  • 11