How to read a tsv file in R where some elements contain \t?

Asked Oct 20 '13 at 18:25

Active Oct 20 '13 at 21:35

Viewed 431 times

A question closer to mine was asked ans answered here.

My problem if fairly simple: I need to import in R a .tsv file, but I cannot because some elements contain a \t so that I received an error like:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
line 34 did not have 6 elements

One way to proceed would be to use gsub in order to replace the \ts. But the file is quite big in size, around 11GB, and doing this pre-processing would probably be too much for my machine. Any idea about a possible short-cut here?

Some context: at the end I need to import the whole dataset into a SQL database; I could do it without doing this conversion but at that point I would have the same problem.

edited May 23 '17 at 10:25

Community

asked Oct 20 '13 at 18:25

Edgar Derby

2,543
4
29
48

Use a tool that is designed for this purpose such as `awk` or `sed`. Search with either: [r] preprocess sed` OR `[r] preprocess awk`. I cannot tell if you already understand that `\t` is a tab in R but that `/t` is not. – IRTFM Oct 20 '13 at 18:46
@DWin I just fixed those /t, my bad. – Edgar Derby Oct 20 '13 at 18:51
1

So perhaps the problem is that you are using the incorrect `read.delim` call. If your file has `\t`'s, then it IS tab-separated as its extension suggests, and such files can be read with `sep="\t"` and perhaps combinations of `comment.char=""` and `quote=""` and `fill =TRUE`. – IRTFM Oct 20 '13 at 18:53
It was the quote, thank you very much! – Edgar Derby Oct 20 '13 at 21:08

How to read a tsv file in R where some elements contain \t?

0 Answers0