0

First question post. Please excuse any formatting issues that may be present.

What I'm trying to do is conditionally replace a factor level in a dataframe column. Reason being due to unicode differences between a right single quotation mark (U+2019) and an apostrophe (U+0027).

All of the columns that need this replacement begin with with "INN8", so I'm using

grep("INN8", colnames(demoDf)) -> apostropheFixIndices
for(i in apostropheFixIndices) {
    levels(demoDfFinal[i]) <- c(levels(demoDf[i]), "I definitely wouldn't")
    (insert code here)
}

to get the indices in order to perform the conditional replacement.

I've taken a look at a myriad of questions that involve naming variables on the fly: naming variables on the fly

as well as how to assign values to dynamic variables

and have explored the R-FAQ on turning a string into a variable and looked into Ari Friedman's suggestion that named elements in a list are preferred. However I'm unsure as to the execution as well as the significance of the best practice suggestion.

I know I need to do something along the lines of

demoDf$INN8xx[demoDf$INN8xx=="I definitely wouldn’t"] <- "I definitely wouldn't"]

but the iterations I've tried so far haven't worked.

Thank you for your time!

Community
  • 1
  • 1

1 Answers1

0

If I understand you correctly, then you don't want to rename the columns. Then this might work:

demoDf <- data.frame(A=rep("I definitely wouldn’t",10) , B=rep("I definitely wouldn’t",10))
newDf  <- apply(demoDf, 2, function(col) { 
  gsub(pattern="’", replacement = "'", x = col) 
})

It just checks all columns for the wrong symbol.

Or if you have a vector containing the column indices you want to check then you could go with

# Let's say you identified columns 2, 5 and 8
cols <- c(2,5,8)
sapply(cols, function(col) { 
  demoDf[,col] <<- gsub(pattern="’", replacement = "'", x = demoDf[,col])
})
Martin Schmelzer
  • 23,283
  • 6
  • 73
  • 98