0

I have three datasets that I want to join in order to create a test set for being used in a supervised machine learning algorithm. The problem is that although they have some variables in common, they generally differ in number of rows and elements. I have tried to use merge() function, but however, the more I use it, the lesser number of rows I get. And at the end, I get a small dataset with a ridiculous number of rows.

I have these three datasets:

test_review   nºrows 22956
test_business nrows  1205
test_user     nrows  5105

I want to keep the original number of reviews from test_review dataset (22956) for the ultimate test_set. The idea is that the business or user that has no coincidence at the time using merge() with the review_set,it appears as Na value in the corresponding new column as a result of merging both datasets. Is there any way to make possible this?

Frank
  • 66,179
  • 8
  • 96
  • 180
Roy
  • 19
  • 4

1 Answers1

0

you can try

library(plyr)
rbind.fill(test_review,test_business,test_user)
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167