I have three datasets that I want to join in order to create a test set for being used in a supervised machine learning algorithm. The problem is that although they have some variables in common, they generally differ in number of rows and elements. I have tried to use merge() function, but however, the more I use it, the lesser number of rows I get. And at the end, I get a small dataset with a ridiculous number of rows.
I have these three datasets:
test_review nºrows 22956
test_business nrows 1205
test_user nrows 5105
I want to keep the original number of reviews from test_review dataset (22956) for the ultimate test_set. The idea is that the business or user that has no coincidence at the time using merge() with the review_set,it appears as Na value in the corresponding new column as a result of merging both datasets. Is there any way to make possible this?