Joining one variable from another dataset

Question

I have two datasets. I want to add the wealth variable of dataset-2 to dataset-1 beside the occupation variable/column. dataset-2 was collected from households head, one response from one household. However, dataset-1 was collected from all women from the household. For this reason, dataset-1 has more responses.

dataset-1: women dataset (total 8678 responses/rows)

 Women_id household_id BMI   Education Occupation
101 1 1   101 1        22.5  1         3
101 1 1   101 2        28.6  3         5
101 1 1   101 3        19.6  2         3
101 1 1   101 3        20.1  2         2
101 1 1   101 4        26.8  3         3

dataset-2: households dataset (total 6784 responses/rows)

household_id    wealth
101 1           2                         
101 2           1
101 3           2
101 4           4

I want to add wealth variable to dataset-1 based on household_id. I have tried the marge function of dplyr

joined_df <- merge(dataset_1, dataset_2, by.x = "household_id", all.x = TRUE, all.y = FALSE )

The two datasets added, however, the value of wealth shows NA.

Women_id household_id BMI   Education Occupation  Wealth
    101 1 1   101 1        22.5  1         3      NA
    101 1 1   101 2        28.6  3         5      NA
    101 1 1   101 3        19.6  2         3      NA
    101 1 1   101 3        20.1  2         2      NA
    101 1 1   101 4        26.8  3         3      NA

I want like follows

 Women_id household_id BMI   Education Occupation  Wealth
    101 1 1   101 1        22.5  1         3       2
    101 1 1   101 2        28.6  3         5       5
    101 1 1   101 3        19.6  2         3       2
    101 1 1   101 3        20.1  2         2       2
    101 1 1   101 4        26.8  3         3       4

Ben Toh · Answer 1 · 2020-08-08T05:45:36.940

2

Since both datasets have same household_id column, you can do so

joined_df <- dplyr::left_join(dataset_1, dataset_2, by = "household_id")

Using merge() would be

joined_df <- merge(dataset_1, dataset_2, by = "household_id", all.x = TRUE)

edited Aug 08 '20 at 05:45

answered Aug 08 '20 at 05:40

Ben Toh

742
5
9

1

The second one can be simplified as `merge(dataset_1, dataset_2, by = "household_id", all.x = TRUE)` – Darren Tsai Aug 08 '20 at 05:44
1

Yes agreed and updated – Ben Toh Aug 08 '20 at 05:46
It still shows NA value. – Md Shariful Islam Aug 08 '20 at 11:20
Can you provide a more concrete example? You can use `dput()` to create script that allows us to put into our R – Ben Toh Aug 08 '20 at 19:27

Joining one variable from another dataset

1 Answers1