-4

I have a Unicode file (UTF-16 FFFE little-endian BOM) which contains rows of tab-separated fields.

Read Splitting unicode (I think) using .split in ruby, I am going to use the Ruby split (file to lines, then line to fields).

BTW, what's the Unicode char for:

  • LF
  • CR
  • Tab

Thanks!

Community
  • 1
  • 1
ohho
  • 50,879
  • 75
  • 256
  • 383
  • 1
    Is that really your question, what are the codepoints for those three characters in Unicode? – Michael Petrotta Mar 15 '10 at 05:03
  • 2
    I agree, is that really the question? This could have been answered with a quick check on the internets but for future reference: http://www.unicode.org/charts/#symbols and in particular http://www.unicode.org/charts/PDF/U0000.pdf and http://en.wikipedia.org/wiki/Basic_Latin_Unicode_block – the Tin Man Mar 15 '10 at 05:32
  • I am asking both, the unicode char, and the unicode code in Ruby syntax. assume blob (blob = Record.first.file_attached) is storing the UTF-16 raw data. then: rows = blob.split("\u000D") rows.size return 1 if I do a u8rows = Iconv.conv("utf-8", "utf-16le", blob).split("\n") u8rows.size is 232 my question is: what is the unicode CR/LF char for splitting a UTF-16 FFFE blob, in Ruby – ohho Mar 15 '10 at 06:42

2 Answers2

8
LF:  U+000A  
CR:  U+000D  
Tab: U+0009  

http://en.wikipedia.org/wiki/List_of_Unicode_characters

Michael Petrotta
  • 59,888
  • 27
  • 145
  • 179
4

Unicode TAB is u0009. LF is u000a and CR is u000d

Same as ASCII actually.

Ayman
  • 11,265
  • 16
  • 66
  • 92
  • 3
    Simply because the first 256 code points of Unicode are the same as in Latin-1. Which in turn uses ASCII for the first 128. – Joey Mar 31 '10 at 10:47