-1

I have two datasets, containing information on Airbnb listings, based on those listings' IDs. The first dataset, "calendar", includes for every ID and every date for 365 days ahead, the price and the availability of the listing. It has 4159641 rows and 4 columns. The second data set, "Listings", includes for those same IDs several characteristics, like longitude, latitude, capacity, etc. It has 8903 rows and 9 variables.

Based on those common IDs I would like to combine the two datasets, so that all the information of the second data set "Listings" will be included to the first on "calendar". More precisely for every row of x listing data and price I want to include the information about longitude, latitude, capacity, etc. The dataset would then have 4159641 rows and 12 columns.

I would be really grateful to anyone who helps me with that.

Thank you!

calendar datasetListing dataset

Zoi
  • 3
  • 2
  • Hello and welcome to StackOverflow! Please read about providing a minimally reproducible example, instead of images of your dataframe: https://stackoverflow.com/help/minimal-reproducible-example – Matt Apr 11 '20 at 21:55

3 Answers3

1

You could try the following:

library(dplyr)
calendar <- read.csv2(...)
listings <- read.csv2(...)

joined_data <- inner_join(calendar, listings, by="ID")

The gereral usage is, as follows:

join_type(first_data_set, second_data_set, by=column_to_join_on)

Be aware of the join_type:

  • inner_join, will combine first and second tables based on the join-predicate
  • left_join, will take all of the values from first_data_set and match them to records from second_data_set, in case of no matches NULL will appear
  • right_join, is the opposite of left_join
  • ...,

There are more, you can check them by yourself in the package. But, the right one for you might be either inner_join or left_join.

abe
  • 198
  • 2
  • 12
0

That's a left join since you want as many rows as there are observations in df1. Many ways to do that:

Base R

This also works with a data.table object (merge is extended for this class of objects)

merge(df1, df2, all.x = TRUE, by = 'ID)

dplyr

library(dplyr)
df1 %>% left_join(df2, by = 'ID')

I advice you to have a look at this post where you can find discussions on other types of join (inner, right...)

linog
  • 5,786
  • 3
  • 14
  • 28
0

Another option is data.table

library(data.table)
setDT(df1)[dfd2, on = .(ID)]
akrun
  • 874,273
  • 37
  • 540
  • 662