I have the following data frame (dat), in which each row is uniquely identified by a person's name.
structure(list(Name = c("John Smith", "Michael Jones", "Eric Stevens",
"Brian McGee", "Dave Baker"), State = c("NJ", "MA", "LA", "WY",
"AZ"), City = c("Trenton", "Springfield", "New Orleans", "Cheyenne",
"Yuma"), DistanceDriven = c("123 km", "15 km", "777 miles", "1029 km",
"8 miles"), DistanceFromHome = c("115 km", "8 km", "725 miles",
"1029 km", "8 miles")), class = "data.frame", row.names = c(NA,
-5L))
>
Which looks more or less like this:
Name State City Distance Driven Distance From Home
John Smith NJ Trenton 123 km 115 km
Michael Jones MA Springfield 15 km 8 km
Eric Stevens LA New Orleans 777 miles 725 miles
Brian McGee WY Cheyenne 1029 km 1029 km
Dave Baker AZ Yuma 8 miles 8 miles
...
I have a second data frame (dat2), also uniquely identified by Name, that includes only a portion of the names in the initial data set, as well as some new names. However, the data for distance driven and distance from home does not exist (other than the column names).
structure(list(Name = c("John Smith", "Derek Thompson", "Eric Stevens",
"Dave Baker"), State = c("NJ", "CA", "LA", "AZ"), City = c("Trenton",
"Los Angeles", "New Orleans", "Yuma"), DistanceDriven = c(NA,
NA, NA, NA), DistanceFromHome = c(NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-4L))
Which looks something like this:
Name State City Distance Driven Distance From
John Smith NJ Trenton
Derek Thompson CA Los Angeles
Eric Stevens LA New Orleans
Dave Baker AZ Yuma
I'm looking to create a new data frame which includes:
- observations that were in both the first data frame (dat) and in the second data frame
- observations that were only in the second data set
- All the data in the rows that would be kept from from the initial data set, as well as all the data in the rows the second set
As such, I really just want to eliminate rows that were only present first data set and not the other. I would thus like the two data frames above to produce:
structure(list(Name = c("John Smith", "Derek Thompson", "Eric Stevens",
"Dave Baker"), State = c("NJ", "CA", "LA", "AZ"), City = c("Trenton",
"Los Angeles", "New Orleans", "Yuma"), DistanceDriven = c("123 km",
"", "777 miles", "8 miles"), DistanceFromHome = c("115 km", "",
"725 miles", "8 miles")), class = "data.frame", row.names = c(NA,
-4L))
Name State City Distance Driven Distance From Home
John Smith NJ Trenton 120 km 115 km
Derek Thompson CA Los Angeles
Eric Stevens LA New Orleans 777 miles 725 miles
Dave Baker AZ Yuma 8 miles 8 miles
I hope that makes sense. Thanks in advance.