I'm working with a dataframe possessing over 7000 observations where each respondent has been allocated a numeric value that identifies their geographical location.
#DF 1
USER_ID Col2 ... NumIdentifier
45 4 101
12 9 98
97 19 7
11 3 104
54 1 109
2 23 110
... ... ...
Now I have been provided with additional information (Var1, Var2) that needs to be allocated to only some of the respondents based on this numeric geographic identifier.
#DF 2
NumIdentifer Var1 Var2
101 13 20
104 16 87
109 34 21
... ... ...
'DF 2' contains one row per numeric geographic identifier and contains a smaller subset of geographic identifiers than are present in 'DF 1'. There are around 30 rows in 'DF 2'.
As a first step I have constructed a new dataframe from 'DF 1' that only includes the respondents with the numeric identifier present in 'DF 2'.
#DF 3
USER_ID Col2 ... NumIdentifier
45 4 101
11 3 104
54 1 109
... ... ...
The ideal output I am aiming for would look like this. If the numerical identifier in 'DF 2' equals the numerical identifier in 'DF 3' then Var1 and Var2 for each corresponding row would be inserted.
#DF 3
USER_ID Col2 ... NumIdentifier Var1 Var2
45 4 101 13 20
11 3 104 16 87
54 1 109 34 21
... ... ... ... ...
Having tried unsuccessfully to transfer some of the techniques from these two posts:
- "r - matching of two dataframes and insertion of rows from one in another on condition"
- "R - replace values in dataframe based on two matching conditions"
I was wondering if there were some other resources or original insights that might be of help. The methods in these two posts seem to only be effective when making matches and conditional replacements between dataframes with an equal number of rows.