1

I have data as follows:

library(stringi)

datfake <- as.data.frame(runif(100, 0, 3000))
names(datfake)[1] <- "Inc"
datfake$type <- sample(LETTERS, 100, replace = TRUE)
datfake$province <- stri_rand_strings(100, 1, "[A-P]")

region_south <- c("A", "B", "C", "D")
region_north <- c("E", "F", "G", "H", "I")
region_east <- c("J", "K", "L")
region_west <- c("M", "N", "O", "P")

EDIT:

In my actual data the regions are as follows:

region_north <- c("Drenthe", "Friesland", "Groningen")
region_east <- c("Flevoland", "Gelderland", "Overijssel")
region_west <- c("Zeeland", "Noord-Holland", "Utrecht", "Zuid-Holland")
region_south <- c("Limburg", "Noord-Brabant")

I would like to add a column that tells me in which reason each province is. All the solutions I come up with are a bit clunky (for example turning the vector region_south into a two column dataframe, where the second column says south and then merging). What would be the easiest way to do this?

Desired output:

        Inc      type province region
1  297.7387         C        J   east
2 2429.0961         E        D  south
Tom
  • 2,173
  • 1
  • 17
  • 44

2 Answers2

3

An idea is to use mget to get the regions, unlist and take advantage of the named vector object and match the values with province and return the names, i.e.

v1 <- unlist(mget(ls(.GlobalEnv, pattern = 'region_')))
res <- names(v1)[match(datfake$province, v1)]
gsub('region_(.+)[0-9]+','\\1' ,res)


  [1] "north" "east"  "north" "north" "south" "south" "south" "west"  "west"  "east"  "south" "south" "west"  "north" "north" "south" "east"  "north" "south" "east"  "north" "west" 
 [23] "south" "west"  "north" "west"  "east"  "north" "east"  "south" "south" "east"  "south" "west"  "north" "east"  "west"  "south" "south" "east"  "north" "west"  "west"  "south"
 [45] "north" "east"  "south" "west"  "north" "south" "east"  "west"  "north" "north" "north" "south" "north" "south" "north" "north" "west"  "north" "north" "south" "west"  "north"
 [67] "east"  "south" "north" "west"  "south" "west"  "north" "north" "north" "south" "north" "east"  "west"  "south" "west"  "north" "west"  "east"  "north" "west"  "south" "east" 
 [89] "north" "west"  "north" "north" "west"  "south" "west"  "north" "west"  "west"  "south" "west"
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • Thank you very much Sotos. I have a small question. Is there a way to do this without using `.GlobalEnv` ? I am having a problem with this i.c.w. R-markdown: https://stackoverflow.com/questions/72097272/listing-to-knitr-environment-r-markdown-error-when-using-list2env-in-r-chunks – Tom May 06 '22 at 08:57
1

We can use case_when along with grepl here:

library(dplyr)
df$region <- case_when(
    grepl(paste0("^[", paste(region_north, collapse=""), "]$"), df$province) ~ "north",
    grepl(paste0("^[", paste(region_south, collapse=""), "]$"), df$province) ~ "south",
    grepl(paste0("^[", paste(region_east, collapse=""), "]$"), df$province) ~ "east",
    grepl(paste0("^[", paste(region_west, collapse=""), "]$"), df$province) ~ "west"
)
Tom
  • 2,173
  • 1
  • 17
  • 44
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Thank you for your answer Tim. When applying your code to my actual data (see EDIT) I ran into some issues. I tried to adapt your code, but was not successful. Do you know how I should adapt my code to the actual data? – Tom May 06 '22 at 09:00