0

File is bidd.nus.edu.sg/group/TTD/filedownload.asp?file=flatfiles/drug-disease_TTD2013.txt

When I use cat -A drug-disease_TTD2013.txt it shows ^M$ in the end of each line. In vim, set list and it shows only $ without ^M.

sed 's/\r//' drug-disease_TTD2013.txt >1.t can make it the same. but I do not know why? (revised)

Also in manual of cat: -v use ^ and M- notation, except for LFD and TAB What's the meaning of that?

not the same situation in this other question

Thank you.

Community
  • 1
  • 1
Zhilong Jia
  • 2,329
  • 1
  • 22
  • 34
  • Can you provide a little output ? – Thomas Ayoub Jun 03 '14 at 14:35
  • 2
    You sure `sed 's/\r//' drug-disease_TTD2013.txt` actually modified the file? Perhaps you meant `sed -i 's/\r//' drug-disease_TTD2013.txt`? Also try `dos2unix`. – konsolebox Jun 03 '14 at 14:36
  • 1
    `vim` probably recognizes the file as having DOS line endings, and so `$` here represents `\r\n`, not just `\n`. `cat` is showing the literal bytes. – chepner Jun 03 '14 at 14:47
  • @konsolebox, Yes, you're right.Actually, I mean `sed 's/\r//' drug-disease_TTD2013.txt > 1.t` . @chepner, is there `^M$` in vim in some situation? (Maybe file from OS X's `\r`?, I'm not sure, just cannot find this kind of file this moment). I encountered that issue some times. – Zhilong Jia Jun 04 '14 at 00:23
  • @ZhilongJIA: If you open the file with `ff=unix`, the file is in dos format and list is set, there will be `^M$` in vim. – Jan Hudec Jun 04 '14 at 04:41
  • @JanHudec , @IngoKarket Actually, before this post, I did not know the `ff=unix` option of vim. Usually, I open file with vim directly, like `vim 1.t`, When I want to check the separator, I will use `:set list`. So, it should be that I open a file and `:set list`, then I find there are `^M$` directly.(Maybe only open a file, it will show `^M` because I forget it.) When `echo 'fsl\tfdlj\r\n' > 1.t`, in vim, it directly shows the `^M`. But `vim drug-disease_TTD2013.txt`, it shows no `^M` without the `set list`. How strange! Why? Thank you. – Zhilong Jia Jun 05 '14 at 00:49
  • @ZhilongJIA: The `set ff` will only affects how the file is _saved_, but if the file was already loaded, the `^M`s are not in memory already, so they won't appear. You have to use the `e ++ff` command to reload the file. In the echo case, `echo` adds one more `^J` itself, so there are two newlines and only the first one is preceded by `^M`. In which case vim will not conclude it is in dos format. – Jan Hudec Jun 05 '14 at 16:01
  • @JanHudec , Thank you. `^J` should be `$`. – Zhilong Jia Jun 06 '14 at 02:14
  • @ZhilongJIA: In `^` notation, line feed is `^J`. `$` is marker for the end of line, but not really name for the control character. Of course, being unix, `^J` terminates lines. – Jan Hudec Jun 06 '14 at 04:51

2 Answers2

3

In vim, type

:set ff?

I suppose it will respond with

fileformat=dos

That means that the end of line is ␍␊ (^M^J, \r\n) rather than just (^J, \n). This is autodetected by vim when opening the file if all newlines are consistently the same two-byte sequence.

To re-open the file in unix mode, just type:

:e ++ff=unix

now it will show the ^M characters. It will show them even without list option, because they now are in the buffer as regular characters.

Jan Hudec
  • 73,652
  • 13
  • 125
  • 172
  • Please see my comment in my question @chepner. Additionally, Also in manual of cat: `-v use ^ and M- notation, except for LFD and TAB` What's the meaning of that? Thank you. – Zhilong Jia Jun 04 '14 at 00:42
  • @ZhilongJIA: `^`-notation represents characters 0 to 31 as `^@` (=0), `^A` to `^Z`, `^[`, `^\`, `^]`, `^^` and `^_`. `M-`-notation represents bytes 128 to 255 as `M-` and character with value 128 less. – Jan Hudec Jun 04 '14 at 04:48
1

cat is a Unix tool, and as such expects the platform's line endings, LF (^J).

Vim is multi-platform and detects the (consistent) use of different line endings. Your file apparently has Windows-style CR-LF line endings, so Vim just shows the $ sigil.

To change that, you can explicitly specify the fileformat on opening:

$ vim -c 'set list' -c 'edit ++fileformat=unix drug-disease_TTD2013.txt'

If you're just on Linux / Unix systems, it's probably easiest to convert the source file to Unix-style line endings, using either sed, dos2unix, or Vim.

Ingo Karkat
  • 167,457
  • 16
  • 250
  • 324
  • Thank you. `sed`, `dos2unix` and `vim` can handle it. Another matter, please see my comment in my question @chepner. – Zhilong Jia Jun 04 '14 at 00:45