0

Best

Basically, I've a table data and a smaller table vocabulary. What I would like to have is, that the values of the vocabularies well be mapped on the data values. And this within a function, in such a way that it can be used +/- dynamicaly

Given:

dt : data.csv
V1____V2___sex__V4__V5_
abc   abc  jeny abc 123
abc   abc  eric abc 123
abc   abc  bob  abc 123

vocabulary1: sex.csv
old___new
jeny  f
eric  m
bob   m

Wanted Result:

V1____V2___sex__V4__V5_
abc   abc  f    abc 123
abc   abc  m    abc 123
abc   abc  m    abc 123

What I've

replace_by_vocabulary <- function(dt,voc,col_name){
  dt[,col_name] <- tolower(dt[,col_name])

  **** something something ***

  return(dt)
}

How I would like to use it ...

dt <- replace_by_vocabulary(dt,vocabulary1,"sex")
dt <- replace_by_vocabulary(dt,vocabulary2,"date")
dt <- replace_by_vocabulary(dt,vocabulary3,"mood")
Dieter
  • 2,499
  • 1
  • 23
  • 41
  • 3
    Possible duplicate of [VLookup type method in R](http://stackoverflow.com/questions/18645222/vlookup-type-method-in-r) – Richard Lusch Oct 04 '16 at 20:09

4 Answers4

2

An alternative to merge that is more in line with what you had:

replace_by_vocabulary <- function(dt,voc,col_name){
  col <- which(colnames(dt) == col_name)
  dt[,col] <- voc$new[match(tolower(dt[,col]), voc$old)]
  return(dt)
}

You want to locate the column in dt from the col_name string input first. Then, use match to find the row indices of voc$old that matches those of tolower(dt[,col]), and use these to retrieve the replacement values from voc$new. Here, we convert the dt[,col] column to all lower case, as you had in your sample code, dynamically in the function so as to match the lower case data in the vocabulary table. The advantage over merge is that we do not have to rename and remove columns afterwards to get the result you want.

Using your data:

replace_by_vocabulary(dt,vocabulary,"sex")
##   V1  V2 sex  V4  V5
##1 abc abc   f abc 123
##2 abc abc   m abc 123
##3 abc abc   m abc 123
aichao
  • 7,375
  • 3
  • 16
  • 18
1

Have you considered merging, then dropping the unwanted column? Like so.

dt<-merge(x=dt, y=vocabulary1, by.x="sex", by.y="old")
dt<-dt %>% 
   select(-sex) %>% 
   mutate(sex=old)
Cyrus Mohammadian
  • 4,982
  • 6
  • 33
  • 62
Joy
  • 769
  • 6
  • 24
1

This post seems to be a duplicate of the one listed below.

VLookup type method in R

You should be able to work out a function to do what you want to do using the merge function:

string = c("abc", "abc", "abc")
names = c("jeny", "eric", "bob")
sex = c("f", "m", "m")

data = data.frame(cbind(string, string, names, string, c(1, 2, 3)))
vocabulary1 = data.frame(cbind(names, sex))

dt = merge(data, vocabulary1, by.x = "names")
dt
Community
  • 1
  • 1
Richard Lusch
  • 1,050
  • 10
  • 19
1

If I understood you aim correctly you want to merge two data.frames together? You should look at ?merge

For instance:

merge(x = dt, y = vocabulary1, by.x = "sex", by.y = "old")

If you want a dynamic function you could do

replace_by_vocabulary <- function(dt,voc,col_name){
    merged_df <- merge(x = dt, y = voc, by.x = "sex", by.y = col_name)
    return(merged_df)
}
tobiasegli_te
  • 1,413
  • 1
  • 12
  • 18
  • no, the aim is to replace values from one column if they match the "old" column of the vocabulary table. but you cant do something like dt$["sex"] == voc$old <-- voc$new. or something – Dieter Oct 04 '16 at 20:50
  • Then I think @aichao provided you a nice solution that you could accept. – tobiasegli_te Oct 04 '16 at 20:59