0

I have this huge file and it has bunch of entries in it. I would like to only read entries that contain certain values like "db_call".

I have tried this:

df<-read.table(text=readLines("H:/ap.log")[grepl("db_call")])

I get this error:

argument "x" is missing, with no default

any ideas?

user1471980
  • 10,127
  • 48
  • 136
  • 235

1 Answers1

5

Create a test file:

writeLines(c("aaa 11","aaa 22","bbb 33"),con="test.txt")

The brute-force method (essentially the approach you tried above) is to read in the entire thing and then take only the pieces you want:

xx <- readLines("test.txt")
xx <- xx[grepl("aaa",xx,fixed=TRUE)]
##  (fixed=TRUE is slightly faster if you don't need regular expressions)
read.table(text=xx)

If you have a big file, I would recommend using grep at the system level (install Cygwin if necessary as you seem to be using Windows) and using pipe, e.g.

Test -- see how many lines contain the target string:

system('grep "aaa" <text.txt | wc')

Read only lines containg aaa:

read.table(pipe('grep "aaa" <test.txt'))

This will be much more efficient than reading the whole thing into R and then selecting the parts you want.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Just curious...how can you tell from the question that the OP is using Windows? – Rich Scriven May 12 '14 at 16:29
  • their file path is `H:/ap.log`, which looks pretty windows-y to me. – Ben Bolker May 12 '14 at 16:30
  • Gotcha. Yeah, I just switched to a linux-based so I'm still learning the differences. – Rich Scriven May 12 '14 at 16:30
  • @Ben Bolker, I get this error: no lines available in input – user1471980 May 12 '14 at 16:39
  • @user1471980 do you get any other errors? Like not finding "grep"? Did you install cygwin and configure it properly? – Spacedman May 12 '14 at 16:51
  • does the brute-force method work? can you use `grep "pattern" – Ben Bolker May 12 '14 at 17:08
  • @Ben Bolker, is there any other way to do this? I cannot install cygwin due to security issues? – user1471980 May 12 '14 at 17:47
  • if you can't install cygwin, or any other `grep` program for Windows (e.g. http://gnuwin32.sourceforge.net/packages/grep.htm), and you want to do this *fast*, you will probably have to write your own C++ code (e.g. via Rcpp) to re-implement `grep` for yourself. I don't know of any built-in or packaged functions for R that implement row filtering, in part because `grep` exists already. – Ben Bolker May 12 '14 at 17:49
  • @Ben Bolker, I used another system which had cygwin and it worked and it was really fast. I have another quick question about tailing a file but will post it as another questions. – user1471980 May 12 '14 at 18:33
  • @Ben Bolker, I got the grep working but for some files, I get this error: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 43 did not have 16 elements, how do you get around this error? – user1471980 May 13 '14 at 14:59
  • Can't possibly say without a reproducible example -- sorry. – Ben Bolker May 13 '14 at 15:09