Using R to analyse an entry and remove the suffix if it is a number

Question

I'm a beginner in R programming language, and I'm using RStudio to work on this project I have. My dataframe has a column for the zone of the mall, but some zones are actually subzones of a bigger zone, so they are called something like: Ikea 1, Ikea 2, Ikea 3, etc. I want to create a new column with the bigger zone for each entry.

The dataframe looks like this:

ID    ENTRY      ZONE                       
1     13:39:40   Casual Dinnerware
2     15:28:43   Van Thiel 3   
3     10:41:05   Caracole 7
4     16:37:31   Entrance

I want to add a new column that has the "mother" zone, in case it is a subzone, for the given example, I want something like:

ID    ENTRY      ZONE                NEW ZONE        
1     13:39:40   Casual Dinnerware   Casual Dinneware
2     15:28:43   Van Thiel 3         Van Thiel
3     10:41:05   Caracole 7          Caracole
4     16:37:31   Entrance            Entrance

Note that not every zone is a subzone!

My ideia was to analyse each entry and if the zone ended with a number, I would remove the number and write the rest in the new column. I already read a few questions that I thought that would help, related to regular expressions and all (like this one), but I couldn't get this to work.

Thank you for your time, if you have any questions, let me know!

Will all "subzones" end in a number? Or can you have stuff like "Ikea A", "Ikea B" and "Ikea C"? If they do all end in a number, `df$NEW_ZONE = gsub("\\s\\d+$", "", df$ZONE")` will do the trick for you. `\\s` is a space, `\\d` is a number, and `$` indicates the end of the string, which is important to ensure that numbers which are part of the bigger zone aren't included. — tblznbits, Apr 08 '16 at 17:35
That solved it! I had to removed the typo in the last " after df$ZONE, but it's done. Thanks a lot! Do I have to mark this as solved or something? — Miguel Pinto, Apr 08 '16 at 17:47
@MiguelPinto you can add an answer and accept it or brittenb can add an answer. — Pierre L, Apr 08 '16 at 17:49

score 2 · Accepted Answer · answered Apr 08 '16 at 17:57

As brittenb said:

df$NEW_ZONE = gsub("\\s\\d+$", "", df$ZONE) will do the trick for you. \\s is a space, \\d is a number, and $ indicates the end of the string, which is important to ensure that numbers which are part of the bigger zone aren't included.

This solved my problem, thank you.

Using R to analyse an entry and remove the suffix if it is a number

1 Answers1