Adding a column, of which the values depend on whether the value in another column, matches one of four vectors

Question

I have data as follows:

library(stringi)

datfake <- as.data.frame(runif(100, 0, 3000))
names(datfake)[1] <- "Inc"
datfake$type <- sample(LETTERS, 100, replace = TRUE)
datfake$province <- stri_rand_strings(100, 1, "[A-P]")

region_south <- c("A", "B", "C", "D")
region_north <- c("E", "F", "G", "H", "I")
region_east <- c("J", "K", "L")
region_west <- c("M", "N", "O", "P")

EDIT:

In my actual data the regions are as follows:

region_north <- c("Drenthe", "Friesland", "Groningen")
region_east <- c("Flevoland", "Gelderland", "Overijssel")
region_west <- c("Zeeland", "Noord-Holland", "Utrecht", "Zuid-Holland")
region_south <- c("Limburg", "Noord-Brabant")

I would like to add a column that tells me in which reason each province is. All the solutions I come up with are a bit clunky (for example turning the vector region_south into a two column dataframe, where the second column says south and then merging). What would be the easiest way to do this?

Desired output:

        Inc      type province region
1  297.7387         C        J   east
2 2429.0961         E        D  south

Sotos · Accepted Answer · 2022-04-06T06:41:12.007

An idea is to use mget to get the regions, unlist and take advantage of the named vector object and match the values with province and return the names, i.e.

v1 <- unlist(mget(ls(.GlobalEnv, pattern = 'region_')))
res <- names(v1)[match(datfake$province, v1)]
gsub('region_(.+)[0-9]+','\\1' ,res)


  [1] "north" "east"  "north" "north" "south" "south" "south" "west"  "west"  "east"  "south" "south" "west"  "north" "north" "south" "east"  "north" "south" "east"  "north" "west" 
 [23] "south" "west"  "north" "west"  "east"  "north" "east"  "south" "south" "east"  "south" "west"  "north" "east"  "west"  "south" "south" "east"  "north" "west"  "west"  "south"
 [45] "north" "east"  "south" "west"  "north" "south" "east"  "west"  "north" "north" "north" "south" "north" "south" "north" "north" "west"  "north" "north" "south" "west"  "north"
 [67] "east"  "south" "north" "west"  "south" "west"  "north" "north" "north" "south" "north" "east"  "west"  "south" "west"  "north" "west"  "east"  "north" "west"  "south" "east" 
 [89] "north" "west"  "north" "north" "west"  "south" "west"  "north" "west"  "west"  "south" "west"

Thank you very much Sotos. I have a small question. Is there a way to do this without using `.GlobalEnv` ? I am having a problem with this i.c.w. R-markdown: https://stackoverflow.com/questions/72097272/listing-to-knitr-environment-r-markdown-error-when-using-list2env-in-r-chunks — Tom, May 06 '22 at 08:57

score 1 · Answer 2 · edited Apr 06 '22 at 06:28

1

We can use case_when along with grepl here:

library(dplyr)
df$region <- case_when(
    grepl(paste0("^[", paste(region_north, collapse=""), "]$"), df$province) ~ "north",
    grepl(paste0("^[", paste(region_south, collapse=""), "]$"), df$province) ~ "south",
    grepl(paste0("^[", paste(region_east, collapse=""), "]$"), df$province) ~ "east",
    grepl(paste0("^[", paste(region_west, collapse=""), "]$"), df$province) ~ "west"
)

edited Apr 06 '22 at 06:28

Tom

2,173
1
17
44

answered Apr 06 '22 at 06:11

Tim Biegeleisen

502,043
27
286
360

Thank you for your answer Tim. When applying your code to my actual data (see EDIT) I ran into some issues. I tried to adapt your code, but was not successful. Do you know how I should adapt my code to the actual data? – Tom May 06 '22 at 09:00

Adding a column, of which the values depend on whether the value in another column, matches one of four vectors

EDIT:

2 Answers2