Deleting ranges of values based on character string in R

Question

I have a pretty gigantic dataframe that looks like this

I want to delete all NUTS2 values for certain countries (let's say Belgium here) and have no clue how to proceed. So far, the only thing that works has been this:

    alldata<-alldata[!(alldata$nutscode=="be21" & alldata$nutslevel=="nuts2"),]

but I would have to keep writing this same line hundreds of times for all possible countries. I want to exclude all values from the dataset where the nutscode variable has the character string "be" in the values AND the nutslevel equals 2.

I've tried using

    alldata[!grepl("be", alldata$nutscode, alldata$nutslevel=="nuts2"),]

or

    alldata[!grepl("be", alldata$nutscode) & alldata$nutslevel=="nuts2",]

since I've seen this posted in a similar thread here, but I am clearly writing something wrong, it doesn't work, it just prints out values. I've also tried many many other alternatives, but nothing worked.

Is there a simpler way of removing the rows containing those specific strings from my dataframe, without writing the same line hundreds of times? Also please please if you reply, do provide a complete answer, I am a total noob at this and if I had known how to write a fancy loop or function to do this for me, I would have done it by now. :/

Thank you very much in advance!

Also for clarification: NUTS codes are used to classify regions and increase in complexity the deeper one goes on a regional level. E.g. AT0 is Austria as a whole, AT2 and AT3 are regions on NUTS1 level and AT21 or AT34 are even smaller regions on NUTS2 level. Each country has their own NUTS code following the same structure (e.g.BE, BE1 and BE34 are examples for NUTS levels 0,1 and 2 regions in Belgium)

score 1 · Answer 1 · answered Jul 28 '17 at 21:25

1

I think you're very close with grepl. Why did you abandon the & construct from your first example? This works fine for me...

nutslevel <- c('nuts1', 'nuts1', 'nuts2', 'nuts2')
nutscode <- c('be2', 'o2', 'be2', 'o2')

dat <- data.frame(nutslevel, nutscode)
dat[!(grepl('be', dat$nutscode) & dat$nutslevel=='nuts2'), ]

last line returns

  nutslevel nutscode
1     nuts1      be2
2     nuts1       o2
4     nuts2       o2

which excludes the third row, as desired.

Also, perhaps subset offers a slightly cleaner way to achieve this

subset(dat, !(grepl('be', nutscode) & nutslevel=='nuts2'))

answered Jul 28 '17 at 21:25

HarlandMason

779
5
17

Interesting. Try running my sample code and seeing if it works for you or if you also have an error. Beyond that, I wonder if perhaps the issue is with your underlying data? How is it stored and how are you reading it in? Are you sure they are strings and not factors or something? – HarlandMason Jul 28 '17 at 21:40
Okay I realise now your code was working all along, I just never saved the thing as a new dataframe, and without doing that, it just prints out the values. – Marie Jul 28 '17 at 21:46
Thank you very much for your help @HarlandMason! – Marie Jul 28 '17 at 21:48
Instead of storing the subset in `newdata` can you not just store it in `alldata` like so: `alldata <-subset(alldata, !(grepl('be', alldata$nutscode) & alldata$nutslevel=='nuts2'))`? Or are you specifically asking for a way to mutate the data in place? Also, sorry for not being explicit in the first place and including the assignment in my answer! – HarlandMason Jul 28 '17 at 21:53

score 0 · Answer 2 · answered Jul 28 '17 at 21:24

0

Just for clarification. do the different countries nutscode? What is the pattern of the nutscode? As far as explained above, You did exclude all values from the dataset where the nutscode variable has the character string "be" in the values AND the nutslevel equals 2. Maybe only if the nutscode differ from country to country then would someone be able to respond to your question. One has to visualize the pattern.. So if possible, give nutscode for at least four countries. I hope the nutslevel=2for all the countries. Thank you

answered Jul 28 '17 at 21:24

Onyambu

67,392
3
24
53

For clarification:NUTS codes are used to classify regions and increase in complexity the deeper one goes on a regional level. E.g. AT0 is Austria as a whole, AT2 and AT3 are regions on NUTS1 level and AT21 or AT34 are even smaller regions on NUTS2 level. Each country has their own NUTS code following the same structure (e.g.BE, BE1 and BE34 are examples for NUTS levels 0,1 and 2 regions in Belgium) – Marie Jul 28 '17 at 21:30

Deleting ranges of values based on character string in R

2 Answers2