0

I am using grep and grepl to search through a character variable and create simplified levels.

I have tried to get the results in a dataframe. I have also tried using if and else if statements and just designating the variables. I have attached this code and the for if statement does not run.

for(i in 1:length(D$ID)){
if(grepl("Bachelor",  D$NDEGREE)[i]){D$NDegree[i] <- "Bachelors"}
else if(grepl("BS", D$NDEGREE)[i]){D$NDegree[i] <- "Bachelors"}
else if (grepl("Master", D$NDEGREE)[i]){D$NDegree[i] <- "Masters"}
else if(grepl("Doctor", D$NDEGREE)[i]){D$NDegree[i] <- "Doctors"}
else(D$NDegree[i] <- D$NDEGREE[i])}

Bachelors <-  D[grep("Bachelor", D$NDEGREE),]
BS <-  D[grep("BS", D$NDEGREE),]
Masters <- D[grep("Master", D$NDEGREE),]
Doctors <- D[grep("Doctor", D$NDEGREE),]

EDIT: I also tried

D$NDEGREE <- gsub("Bachelor", "Bachelors", D$NDEGREE)
D$NDEGREE <- gsub("BS", "Bachelors", D$NDEGREE)
D$NDEGREE <- gsub("Master", "Masters", D$NDEGREE)
D$NDEGREE <- gsub("Doctor", "Doctors", D$NDEGREE)

This just runs through but nothing happens. The for if statement doesnt work. it just keeps running indefinitely.

NFerrari
  • 1
  • 1
  • 1
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jun 10 '19 at 04:00
  • 1
    I think a better option is to create a key/val dataset and do a fuzzy join `keyval <- data.frame(NDegree = c("Bachelor", "BS", "Master", "Doctor"), val = c("Bachelors", "Bachelors", "Masters", "Doctors"), stringsAsFactors = FALSE); library(fuzzyjoin);regex_left_join(D, keyval, by = "NDegree")` – akrun Jun 10 '19 at 04:15
  • If you find that you're doing the same thing over and over again, it might be time to write a function that does that and `Map` it to your columns. – NelsonGon Jun 10 '19 at 05:15

2 Answers2

1

You do not have to for-loop over a column in R. Just use vectorized operations. This is an operation that applies a function to a whole vector. Use the gsub function to recode values.

df <- data.frame(
  NDEGREE =c("Bachelor", "Master", "Doctor", "BS"),
  Value = c(1,1,1,1)
)


df$NDEGREE <- gsub("Bachelor", "Bachelors", df$NDEGREE)
df$NDEGREE <- gsub("BS", "Bachelors", df$NDEGREE)
df$NDEGREE <- gsub("Master", "Masters", df$NDEGREE)
df$NDEGREE <- gsub("Doctor", "Doctors", df$NDEGREE)


Bachelors <- df[grep("Bachelors", df$NDEGREE),]
Doctors <- df[grep("Doctors", df$NDEGREE),]
Masters <- df[grep("Masters", df$NDEGREE),]
DSGym
  • 2,807
  • 1
  • 6
  • 18
1

An easier option (if there are many values) would be to create a key/val dataset and then do a fuzzy join

library(fuzzyjoin)
regex_left_join(D, keyval, by = "NDegree")

data

keyval <- data.frame(NDegree = c("Bachelor", "BS", "Master", "Doctor"), 
    val = c("Bachelors", "Bachelors", "Masters", "Doctors"), 
          stringsAsFactors = FALSE);
akrun
  • 874,273
  • 37
  • 540
  • 662