I wrote a function for wrangling strings. It includes converting non-English character to English character and other operations.
trim <- function (x) gsub("^\\s+|\\s+$", "", x)
library(qdapRegex)
wrangle_string <- function(s) {
# 1 character substitutions
old1 <- "šžþàáâãäåçèéêëìíîïðñòóôõöùúûüýşğçıöüŞĞÇİÖÜ"
new1 <- "szyaaaaaaceeeeiiiidnooooouuuuysgciouSGCIOU"
s1 <- chartr(old1, new1, s)
# 2 character substitutions
old2 <- c("œ", "ß", "æ", "ø")
new2 <- c("oe", "ss", "ae", "oe")
s2 <- s1
for(i in seq_along(old2)) s2 <- gsub(old2[i], new2[i], s2, fixed = TRUE)
s2
#diger donusumlar
s2= gsub('[[:punct:] ]+',' ',s2)
s2=tolower(s2)
s2=trim(s2)
s2=rm_white(s2)
return(s2)
}
Here is my minimal data for reproduction:
outgoing=structure(list(source = structure(c(1L, 1L, 1L), .Label = "YÖNETIM KURULU BASKANLIGI", class = "factor"),
target = structure(c(2L, 1L, 3L), .Label = c("x Yayincilik Reklam ve Organizasyon Hizmetleri",
"Suat", "Yavuz"), class = "factor")), .Names = c("source",
"target"), row.names = c(NA, 3L), class = "data.frame")
The thing is when I call the function directly it works.
wrangle_string("YÖNETİM KURULU BAŞKANLIĞI")
The result is:
"yonetim kurulu baskanligi"
When I use it apply
function on a data frame it looks like work when I check it with View(outgoing)
function there is no problem.
outgoing$source=as.vector(sapply(outgoing$source,wrangle_string))
However, when I check the cell with outgoing[1,1]
I get this:
"yonetİm kurulu başkanliği"
How can I fix this problem?