0

I have very big data and the next step is to delete certain strings (i.e. the associated rows) based on patterns. I need to use Regex for that. For example image column A as:

A-929.XZT-93002-B-DKE
A-938-XZT-29849-B-DKE
A-938-AXZ-93923-B-DKE
...
...

There are many more columns besides A. Now I want to delete all rows completely which contain the phrade "XZT" with any element before except a character. In this case it would be row1 and row2.

My question is as follows:

Can this be done in R as effectively as for example in VBA? Which package would you recommend to do so, or can it be done just as effectively with the base functions?

I am asking because there are different ways to apply Regex in R and I have to do it for about ~ 20,000++ rows numerous times, so I want to do it as fast as possible.

Thanks

EDC
  • 613
  • 2
  • 7
  • 16
  • 2
    maybe `mydata[rowSums(grepl("(?<![A-Z])XZT", mydata, perl=T)) > 0, ]` – Pierre L Sep 22 '15 at 14:40
  • @akrun thanks but in this case I wasn't asking for a solution, which I could create myself I hope :) I was asking for the right method. I hope that's allowed on this forum. – EDC Sep 22 '15 at 14:41
  • I have no idea about the speed but maybe piping it through a program like `sed` before reading it into R would be faster. Looks like you might be on windows though? – ekstroem Sep 22 '15 at 14:49
  • @ekstroem yeah I am on Windows – EDC Sep 22 '15 at 14:50
  • 1
    I doubt that there are many things that VBA can do faster than R. But I haven't used VBA since several years. – Roland Sep 22 '15 at 14:53
  • if i'm not mistaken `stringi` is a package designed for fast regex – MichaelChirico Sep 22 '15 at 14:53
  • 1
    also, 20k rows is not that much... perhaps look into investing in processors & ram if you'll be doing this a lot – MichaelChirico Sep 22 '15 at 14:54
  • Can I do anything with R which I can do with `VBScript Regular Expressions`? I didnt find any source like this: http://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops. Also I will have a look at `stringi` – EDC Sep 22 '15 at 15:09

0 Answers0