1

I'm a beginner in R programming language, and I'm using RStudio to work on this project I have. My dataframe has a column for the zone of the mall, but some zones are actually subzones of a bigger zone, so they are called something like: Ikea 1, Ikea 2, Ikea 3, etc. I want to create a new column with the bigger zone for each entry.

The dataframe looks like this:

ID    ENTRY      ZONE                       
1     13:39:40   Casual Dinnerware
2     15:28:43   Van Thiel 3   
3     10:41:05   Caracole 7
4     16:37:31   Entrance

I want to add a new column that has the "mother" zone, in case it is a subzone, for the given example, I want something like:

ID    ENTRY      ZONE                NEW ZONE        
1     13:39:40   Casual Dinnerware   Casual Dinneware
2     15:28:43   Van Thiel 3         Van Thiel
3     10:41:05   Caracole 7          Caracole
4     16:37:31   Entrance            Entrance

Note that not every zone is a subzone!

My ideia was to analyse each entry and if the zone ended with a number, I would remove the number and write the rest in the new column. I already read a few questions that I thought that would help, related to regular expressions and all (like this one), but I couldn't get this to work.

Thank you for your time, if you have any questions, let me know!

Community
  • 1
  • 1
  • Will all "subzones" end in a number? Or can you have stuff like "Ikea A", "Ikea B" and "Ikea C"? If they do all end in a number, `df$NEW_ZONE = gsub("\\s\\d+$", "", df$ZONE")` will do the trick for you. `\\s` is a space, `\\d` is a number, and `$` indicates the end of the string, which is important to ensure that numbers which are part of the bigger zone aren't included. – tblznbits Apr 08 '16 at 17:35
  • That solved it! I had to removed the typo in the last " after df$ZONE, but it's done. Thanks a lot! Do I have to mark this as solved or something? – Miguel Pinto Apr 08 '16 at 17:47
  • `df1$new_zone <- sub("(.*) \\d?", "\\1", df1$ZONE` – Pierre L Apr 08 '16 at 17:48
  • @MiguelPinto you can add an answer and accept it or brittenb can add an answer. – Pierre L Apr 08 '16 at 17:49

1 Answers1

2

As brittenb said:

df$NEW_ZONE = gsub("\\s\\d+$", "", df$ZONE) will do the trick for you. \\s is a space, \\d is a number, and $ indicates the end of the string, which is important to ensure that numbers which are part of the bigger zone aren't included.

This solved my problem, thank you.