0

I'm looking for a way to iterate over dataframe in R, and add to each observation few variables, taken from another dataframe. My data is as follows:

I have a dataframe of reviews that users have done on different products. For example, an observation in this dataframe includes the following fields: user_id, product_id, time_of_review, length_of_review, and other attributes of the reviewing action. Note that a user can post a review only once on each product (if at all) so the combination of user_id and product_id is unique. I want to go over this dataframe, and complete some information from another dataframe. This other dataframe contains observations of user's attributes. Each observation contains user_id and product_id, as well as other fields regarding the review that this user posted on this product. So I need to iterate over the original dataframe, and for each combination of user_id-product_id I need to go to the other dataframe and extract the additional fields, and add them to the original. How can I do it in a proper way?

user3017075
  • 351
  • 3
  • 16
  • 1
    a proper way to ask a question is to include sample data (input, desired) to explain your problem and to reproduce it – HubertL May 27 '16 at 21:28

1 Answers1

1

Base R contains an awesome function called merge() that can be used for exactly this purpose. Use:

merge(df1,df2,by=c("user_id","product_id"))

This is the simplest and most idiomatic way to do it.

Hope this helps!!

Toby Penk
  • 126
  • 7
  • Thanks! It does help! Is there any limitation regarding the size fit of the two dataframes? Because in df2 I might have more users than in df1... – user3017075 May 27 '16 at 21:33
  • `merge()` defaults to complete cases, so if you'd like to retain all the records from the larger dataframe, you can add the parameter `all=T` to `merge()`. If you want to keep all the records from one, but not the other; for example, if you want to keep users with no review, but not reviews with no user, you can use `all.x=T` or `all.y=T` where x and y and the first and second parameters in `merge()`, respectively. – Toby Penk May 27 '16 at 21:36