I'm trying to up my R game, and I clearly need some guidance. I wanna create a lot of variables (93, to be exact), but I wanna do that the smart way. But I'm stuck.
My problem: a dataframe (df) containing some variables, including the "main" one, which contains the stems of my description variable. Another dataframe (reference), more of a reference table, containing two columns - the category and the regex necessary to identify it; I kept only 3 entries, but its 93 originally.
The code:
library(tidyverse)
df <- tibble("FlawType" = c(rep("Medium", 5), rep("Major", 5)),
"Description" = c("utilizaca indev equip final divers daquel justific aquisica",
"utilizaca modal indev licitac aquisica mater previst plan trabalh conveni nomd",
"aquisica indev lanch gener alimentici secret municip educaca mont r",
"uso indev recurs bloc atenca basic aquisica medic realizaca trat intim prefeit decisa judic",
"indici irregular favorec process licitato no aquisica medic farmac basic raza concentraca indevid empr certam",
"localizaca bem vist realiz equip fiscalizaca cgu escol municip abril municipi palestin par",
"telecentr inat ausenc equip local instalaca equip defeit",
"equip local",
"equip mater permanent adquir implantaca banc aliment send utiliz outr local simples encontr in loc realiz equip",
"mater equip gener alimentici adquir recurs cra por entreg local atend"))
reference <- tibble(var = c("Aquisição indevida", "Equipamentos não localizados", "Despesa irregular"),
regex = c("(aquisica.*indev|indev.*aquisica)", "(equip.*local|local.*equip)", "(desp.*irregul|irregul.*desp)"))
I kinda can create three new variables in my sample df, but it turns out to be a list, and I have to extract it. I thought it wouldn't be a problem, but when I try to run it my original df (60k+ lines), it gets stuck...
The idea is: use the reference$var as the name of each new variable, using the associated regex (reference$regex) to create a dummy for every entry in the reference.
Code that works in the sample but not in the original df, just for reference:
varnames <- unique(reference$var)
for(varname in varnames){
fd[[varname]] <- df %>%
mutate(!!paste0(varname) := ifelse(str_detect(df$Description, reference$regex), 1, 0))
}
df <- bind_cols(df, map_df(fd,3))
Thanks in advance.