0

I'm trying to import a file using UTL_FILE in oracle 11g, there is a way to know what charater encoding a file is in?

What I'm trying to do is return an error when the charset of a file is different from UTF-8 in order to avoid errors on insert.

Avhelsing
  • 81
  • 2
  • 7

1 Answers1

0

Probably not (at least not easily).

If you're really lucky, the file would begin with a byte-order mark that lets you determine that the file is UTF-8 encoded with a reasonable degree of confidence. But that is an entirely optional attribute of a file (and one that your code would have to discard before reading the real data in the file).

Beyond that, you're into the realm of inspecting the data and attempting to determine the most probable character set. In general, that is hard particularly if you have a mostly English file where at least the first few hundred or thousand bytes of the file might be both valid 7-bit ASCII and UTF-8. You can read the file and look to see if there are any invalid UTF-8 code points. That doesn't definitively tell you that it's valid UTF-8 but it is probably close enough to act as if it is.

Justin Cave
  • 227,342
  • 24
  • 367
  • 384