Tab / LF / CR unicode character

Question

I have a Unicode file (UTF-16 FFFE little-endian BOM) which contains rows of tab-separated fields.

Read Splitting unicode (I think) using .split in ruby, I am going to use the Ruby split (file to lines, then line to fields).

BTW, what's the Unicode char for:

Thanks!

Is that really your question, what are the codepoints for those three characters in Unicode? — Michael Petrotta, Mar 15 '10 at 05:03
I agree, is that really the question? This could have been answered with a quick check on the internets but for future reference: http://www.unicode.org/charts/#symbols and in particular http://www.unicode.org/charts/PDF/U0000.pdf and http://en.wikipedia.org/wiki/Basic_Latin_Unicode_block — the Tin Man, Mar 15 '10 at 05:32
I am asking both, the unicode char, and the unicode code in Ruby syntax. assume blob (blob = Record.first.file_attached) is storing the UTF-16 raw data. then: rows = blob.split("\u000D") rows.size return 1 if I do a u8rows = Iconv.conv("utf-8", "utf-16le", blob).split("\n") u8rows.size is 232 my question is: what is the unicode CR/LF char for splitting a UTF-16 FFFE blob, in Ruby — ohho, Mar 15 '10 at 06:42

score 8 · Accepted Answer · answered Mar 15 '10 at 05:02

8

LF:  U+000A  
CR:  U+000D  
Tab: U+0009

answered Mar 15 '10 at 05:02

Michael Petrotta

score 4 · Answer 2 · answered Mar 15 '10 at 05:03

4

Unicode TAB is u0009. LF is u000a and CR is u000d

Same as ASCII actually.

answered Mar 15 '10 at 05:03

Ayman

3

Simply because the first 256 code points of Unicode are the same as in Latin-1. Which in turn uses ASCII for the first 128. – Joey Mar 31 '10 at 10:47

2 Answers2