0

I was using Scala and try to store some String Integer as straight Integer. However some String Integer has a format of this ÿþ in front of the number.

How do I clean this up? Why does this happen?


Rephrased question:

How do I check all characters like ÿþand delete them so I can safely convert Strings to Integer? I don't know if this appears only on the first line or not. The file has 16,000 lines and although I only see it at the first line so far, I can't be sure.

windweller
  • 2,365
  • 5
  • 33
  • 56
  • "I don't know if this appears only on the first line or not. The file has 16,000 lines and although I only see it at the first line so far, I can't be sure." - It's only 16000 lines. Check! Just load it into notepad... – The Archetypal Paul Mar 31 '14 at 19:58
  • @Paul OK..it turns out to be an encoding problem. – windweller Mar 31 '14 at 23:26

1 Answers1

4

These two are the Byte order mark of UTF-16.

You could use the tools from Apache Commons IO.

Community
  • 1
  • 1
  • means it's words like `ÿþ123123123` – windweller Mar 31 '14 at 13:52
  • It's Integer but since it's read from a file, the format is String – windweller Mar 31 '14 at 13:52
  • Does only the first line of such a file contain the BOM? –  Mar 31 '14 at 13:52
  • That's the question..I don't know. The file has 16,000 lines. – windweller Mar 31 '14 at 14:03
  • 1
    @WindDweller rephrased question -- *does only the first integer that you read prefixed with such characters*? Rephrased top answer is *It's a special character used to signify use of particular scheme of data storing (UTF-16) -- there are libraries that allow you to read file with specifying that scheme, instead of relying on default one, so character will be swallowed and data will be read fully properly* – om-nom-nom Mar 31 '14 at 14:18
  • Very nice rephrasing, indeed. –  Mar 31 '14 at 14:24
  • @om-nom-nom I was trying out `ISO` or `Windows format`, which are not working at all. – windweller Mar 31 '14 at 16:44