I have found questions that are similar to what I want to do but not the exact same.
I am working in R. I have two dataframes I want to combine. The only issue is that there are more observations in one dataframe than the other. (The data I have is proprietary so I'll make up some data to show you.) Let's say dataframe A has 450 observations and dataframe B has 500 observations.
Both dataframes have a variable that identifies a unique person. Lets say it’s a social security number. So there exist people who are in both dataframe A and dataframe B. But there are some people who exist in one and not the other. I want to keep the rows of people who are in both dataframes and eliminate the people who are in only one dataframe and not the other. To illustrate with fake data on a smaller scale...
Dataframe A
SSID Age Wage
[1]12345 23 45645
[2]15461 45 534688
[3]12458 12 475412
[4]68741 63 124
[5]36987 91 458746
Dataframe B
SSID Education Race
[1]12345 2 8
[2]15461 3 4
[3]89512 1 3
[4]68741 2 7
[5]99423 0 8
[6]79225 1 4
[7]66598 3 2
Dataframe C (what I want)
SSID Age Wage Education Race
[1]12345 23 45645 2 8
[2]15461 45 534688 3 4
[3]68741 63 124 2 7
So only the common rows, pertaining to the SSID variable, are preserved, and everything else is trashed. How can I do this?
I tried doing stuff like C = which(B$SSID %in% A$SSID)
but to no avail.