I have two datasets. I want to add the wealth variable of dataset-2 to dataset-1 beside the occupation variable/column. dataset-2 was collected from households head, one response from one household. However, dataset-1 was collected from all women from the household. For this reason, dataset-1 has more responses.
dataset-1: women dataset (total 8678 responses/rows)
Women_id household_id BMI Education Occupation
101 1 1 101 1 22.5 1 3
101 1 1 101 2 28.6 3 5
101 1 1 101 3 19.6 2 3
101 1 1 101 3 20.1 2 2
101 1 1 101 4 26.8 3 3
dataset-2: households dataset (total 6784 responses/rows)
household_id wealth
101 1 2
101 2 1
101 3 2
101 4 4
I want to add wealth variable to dataset-1 based on household_id. I have tried the marge function of dplyr
joined_df <- merge(dataset_1, dataset_2, by.x = "household_id", all.x = TRUE, all.y = FALSE )
The two datasets added, however, the value of wealth shows NA.
Women_id household_id BMI Education Occupation Wealth
101 1 1 101 1 22.5 1 3 NA
101 1 1 101 2 28.6 3 5 NA
101 1 1 101 3 19.6 2 3 NA
101 1 1 101 3 20.1 2 2 NA
101 1 1 101 4 26.8 3 3 NA
I want like follows
Women_id household_id BMI Education Occupation Wealth
101 1 1 101 1 22.5 1 3 2
101 1 1 101 2 28.6 3 5 5
101 1 1 101 3 19.6 2 3 2
101 1 1 101 3 20.1 2 2 2
101 1 1 101 4 26.8 3 3 4