-2

I have two dataframes. 1 full of data about individuals, including their street name and house number but not their house size. And another with information about each house including street name and house number and house size but not data on the individuals living in that house. I'd like to add the size information to the first dataframe as a new column so I can see the house size for each individual.

I have over 200,000 individuals and around 100,000 houses and the methods I've tried so far (cutting down the second dataframe for each individual) are painfully slow. Is their an efficient way to do this? Thank you.

Jaap
  • 81,064
  • 34
  • 182
  • 193
  • 1
    You just need to `merge` the two dataframes. More specific advice requires [a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Thomas Nov 15 '14 at 23:12

2 Answers2

2

Using @jazzurro's example another option for larger datasets would be to use data.table

library(data.table)
setkey(setDT(df1), street, num)
setkey(setDT(df2), street, num)
df2[df1]
#     size     street num person
#1:  large  liliha st   3    bob
#2:     NA  mahalo st  32    dan
#3:  small  makiki st  15    ana
#4:     NA   nehoa st  11  ellen
#5: medium nuuanu ave   8  cathy
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Here is my suggestion. Given what you described in your data, I created a sample data. However, please try to provide sample data from next time. When you provide sample data and your code, you are more likely to receive help and let people save more time. You have two key variables to merge two data frames, which are street name and house number. Here, I chose to keep all data points in df1.

df1 <- data.frame(person = c("ana", "bob", "cathy", "dan", "ellen"),
                  street = c("makiki st", "liliha st", "nuuanu ave", "mahalo st", "nehoa st"),
                  num = c(15, 3, 8, 32, 11),
                  stringsAsFactors = FALSE)

  #person     street num
#1    ana  makiki st  15
#2    bob  liliha st   3
#3  cathy nuuanu ave   8
#4    dan  mahalo st  32
#5  ellen   nehoa st  11

df2 <- data.frame(size = c("small", "large", "medium"),
                  street = c("makiki st", "liliha st", "nuuanu ave"),
                  num = c(15, 3, 8),
                  stringsAsFactors = FALSE)

#    size     street num
#1  small  makiki st  15
#2  large  liliha st   3
#3 medium nuuanu ave   8

library(dplyr)

left_join(df1, df2)

#      street num person   size
#1  makiki st  15    ana  small
#2  liliha st   3    bob  large
#3 nuuanu ave   8  cathy medium
#4  mahalo st  32    dan   <NA>
#5   nehoa st  11  ellen   <NA>
jazzurro
  • 23,179
  • 35
  • 66
  • 76