0

I need to merge two data frames of unequal length by comparing two columns and if the context match (or is quite similar) then merge that specific row.

For example this is what data frame 1 looks like

Car   horse.power   price
Car1  15            NA
Car2  25            NA
Car3  55            NA

and data frame 2:

Car             price
Car1 (2014)     1234
Car2 turbo      9876

I want my final data frame to look like this:

Car   horse.power   price
Car1  15            1234
Car2  25            9876
Car3  55            NA

How would i do that???

Thanks in advance!

EDIT: I do not understand how this is a duplicate to this one. In the linked question you have a fixed format for the customer ID. In my case you dont. If i apply the solution provided in the linked question I end up with this:

Car         horse.power   price
Car1        15            NA
Car1 (2014) NA            1234
Car2        25            NA
Car2 turbo  NA            9876
Car3        55            NA
Community
  • 1
  • 1
st0rmy
  • 11
  • 2
  • Welcome to StackOverflow! You'll find people will downvote your question if you do not share your attempts and specifically what is failing. See [this question](http://stackoverflow.com/help/how-to-ask) for best practices on how to ask. – TayTay Nov 17 '16 at 20:24

1 Answers1

0

One way to do it would be to use the levenshtein distance (using adist()):

# set up the two dataframes

df1 <- data.frame(Car = c('Car1', 'Car2', 'Car3'),
                  horse.power = c(15, 25, 55),
                  price = c(NA, NA, NA))
df2 <- data.frame(Car = c('Car1 (2014)', 'Car2 turbo', 'Car3 slow'),
                  price = c(1234, 9876, 21212))

# take the minimal distance and choose the price column
df1$price <- df2$price[apply(adist(df2$Car, df1$Car), 1, which.min)]
df1

This yields:

   Car horse.power price
1 Car1          15  1234
2 Car2          25  9876
3 Car3          55 21212
Jan
  • 42,290
  • 8
  • 54
  • 79