0

the dataframe df.freq below is full of words and their properties (e.g. frequency, length, etc).

 df.freq
 'data.frame':  221324 obs. of  7 variables:
 $ Word         : Factor w/ 221324 levels "a","aa-class",..: 195399 6167 198867 90289 1 131901 91600 95885 195346 95685 ...
 $ BlogFreqPm   : num  48737 28649 27965 23737 23630 ...
 $ TwitterFreqPm: num  30241 14145 25420 29598 19788 ...
 $ NewsFreqPm   : num  56009 25139 25590 5516 25291 ...
 $ CumFreqPm    : num  134987 67932 78975 58851 68709 ...
 $ LogCumFreq   : num  11.8 11.1 11.3 11 11.1 ...
 $ Length       : int  3 3 2 1 1 2 2 2 4 2 ...

I need to merge the columns LogCumFreq and Length in the dataframe above with the dataframe df.words below.

 df.words
 Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    
 $ target                : chr  "HAT" "DEPART" "MUD" "LUST" ...
 $ prime                 : chr  "hat" "department" "muddy" "luster" ...
 ...

What I'd need to do is to apply merge so that the variables LogCumFreq and Length in df.freq are inserted for each row in two different columns, each of which contains the values for the prime and the target, respectively.

I've tried to use merge for prime first and then target, but since the two values are always on the same row, they overwrite each other. Does anybody know how to do this?

EDIT: The dput example of the dataframes are below.

df.words <-

structure(list(prime = structure(c(2L, 1L, 5L, 4L, 3L), .Label = c("department", 
"hat", "hunter", "luster", "muddy"), class = "factor"), target = structure(c(2L, 
1L, 4L, 3L, 5L), .Label = c("DEPART", "HAT", "LUST", "MUD", 
"SPY"), class = "factor")), class = "data.frame", row.names = c(NA, 
-5L))

df.freq <- 
structure(list(word = structure(c(3L, 2L, 8L, 6L, 4L, 1L, 7L, 
5L, 9L), .Label = c("depart", "department", "hat", "hunter", 
"lust", "luster", "mud", "muddy", "spy"), class = "factor"), 
    freq = c(4.3, 5.323, 9.9, 2, 0.56, 4.5, 6.99, 10.88, 7), 
    length = c(3L, 10L, 5L, 6L, 6L, 6L, 3L, 4L, 3L)), row.names = c(NA, 
-9L), class = "data.frame")

The following is an example of the desired output:

df.words.freq <- 

structure(list(prime = structure(c(2L, 1L, 5L, 4L, 3L), .Label = c("department", 
"hat", "hunter", "luster", "muddy"), class = "factor"), target = structure(c(2L, 
1L, 4L, 3L, 5L), .Label = c("DEPART", "HAT", "LUST", "MUDDY", 
"SPY"), class = "factor"), freq.prime = c(4.3, 5.323, 9.9, 2, 
0.56), freq.target = c(4.3, 4.5, 6.99, 10.88, 7), length.prime = c(3, 
10, 5, 6, 6), length.target = c(3, 6, 3, 4, 3)), row.names = c(NA, 
-5L), class = "data.frame")
Frank
  • 66,179
  • 8
  • 96
  • 180
RobertP.
  • 213
  • 1
  • 12

2 Answers2

0

This is just two merges. Most of the work here is getting the column names you want:

result = merge(df.words, setNames(df.freq, nm = paste(names(df.freq), "prime", sep = ".")),
      by.x = "prime", by.y = "word.prime")
result$target = tolower(result$target)
result = merge(result, setNames(df.freq, nm = paste(names(df.freq), "target", sep = ".")),
      by.x = "target", by.y = "word.target")
#   target      prime freq.prime length.prime freq.target length.target
# 1 depart department      5.323           10        4.50             6
# 2    hat        hat      4.300            3        4.30             3
# 3   lust     luster      2.000            6       10.88             4
# 4    mud      muddy      9.900            5        6.99             3
# 5    spy     hunter      0.560            6        7.00             3

You can use toupper to re-convert target to upper case post-hoc, if you want.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
0

You will have to merge in 2 steps and then rename your columns as per requirement using names() or colnames()

df1 <- merge(df.words, df.freq, by.x = "prime", by.y = "word", all.x = TRUE)
df1$targetword <- tolower(df1$target)   #to match the keywords

df2 <- merge(df1, df.freq, by.x = "targetword", by.y = "word", all.x = TRUE)
df2$targetword <- NULL
SmitM
  • 1,366
  • 1
  • 8
  • 14