0

A question closer to mine was asked ans answered here.

My problem if fairly simple: I need to import in R a .tsv file, but I cannot because some elements contain a \t so that I received an error like:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
line 34 did not have 6 elements

One way to proceed would be to use gsub in order to replace the \ts. But the file is quite big in size, around 11GB, and doing this pre-processing would probably be too much for my machine. Any idea about a possible short-cut here?

Some context: at the end I need to import the whole dataset into a SQL database; I could do it without doing this conversion but at that point I would have the same problem.

Community
  • 1
  • 1
Edgar Derby
  • 2,543
  • 4
  • 29
  • 48
  • Use a tool that is designed for this purpose such as `awk` or `sed`. Search with either: [r] preprocess sed` OR `[r] preprocess awk`. I cannot tell if you already understand that `\t` is a tab in R but that `/t` is not. – IRTFM Oct 20 '13 at 18:46
  • @DWin I just fixed those /t, my bad. – Edgar Derby Oct 20 '13 at 18:51
  • 1
    So perhaps the problem is that you are using the incorrect `read.delim` call. If your file has `\t`'s, then it IS tab-separated as its extension suggests, and such files can be read with `sep="\t"` and perhaps combinations of `comment.char=""` and `quote=""` and `fill =TRUE`. – IRTFM Oct 20 '13 at 18:53
  • It was the quote, thank you very much! – Edgar Derby Oct 20 '13 at 21:08

0 Answers0