4

I'm using the r package fuzzyjoin to join two data sets. Currently I am joining on one column, and would like to join on two.

  • first dataset has the name of a location and a column called config
  • second dataset has the name of a location and two three attributes
  • I would like to join on two columns name and TM

I've tried adding in the column names I wish to join as a vector but this doesn't seem to work. I get an error that says:

  • Error: Each variable must be a 1d atomic vector or list. Problem variables: col.

    #This works to join on 1 column
    library(fuzzyjoin)
    stringdist_inner_join(Dataset1, Data2, by ="Name", distance_col = NULL)
    
    #Joiningontwocolumns
    stringdist_inner_join(Dataset1, Dataset2, by =c("Name","TM"), distance_col = NULL)
    

Dataset1:

 Name           Config     TM
 ALTO D         BB         T
 CONTRA         ST         D
 EIGHT A        DD         D
 OPALAS         BB         T
 SAUSALITO Y    AA         D
 SOLANO J       ST         D

Dataset2:

 Name       Age     Rank    TM
 ALTO D     50      2       T
 ALTO D     20      6       D
 CONTRA     10      10      D
 CONTRA     15      15      T
 EIGHTH     18      21      T
 OPAL       19      4       T
 SAUSALITO  2       12      D
 SOLANO     34      43      D
zx8754
  • 52,746
  • 12
  • 114
  • 209
steppermotor
  • 701
  • 6
  • 22

1 Answers1

2

It took a while for me to figure out but I believe the correct syntax for multiple column joins is:

stringdist_inner_join(data1, data2, 
                      by = list(x = c("Name", "TM"), y = c("Name", "TM")), 
                      distance_col = NULL))
Arthur Yip
  • 5,810
  • 2
  • 31
  • 50