I have two dataframes df_a
with two columns colA
, colB
, and df_b
with one column colA
.
df_a <- data.frame(colA = sample(1:10, 10), colB = sample(LETTERS[1:20],10))
> df_a
colA colB
1 2 F
2 8 J
3 5 G
4 9 A
5 10 R
6 4 N
7 7 D
8 1 B
9 3 Q
10 6 H
df_b <- data.frame(colA = sample(1:10, 10))
> df_b
colA
1 9
2 5
3 3
4 7
5 1
6 8
7 2
8 4
9 10
10 6
I have to create a new column colB
in df_b
after comparing colA
of df_a
with colA
of df_b
.
> df_b$colB <- df_a[df_a$colA %in% df_b$colA,'colB']
> df_b
colA colB
1 9 F
2 5 J
3 3 G
4 7 A
5 1 R
6 8 N
7 2 D
8 4 B
9 10 Q
10 6 H
The corresponding values in both dataframes are not the same. For example in df_a
, colA
value 9 has A in colB
. Whereas in df_b
, colA
value 9 has F in colB
. Is this issue due to unsorted dataframes ?
Note: I couldn't find a similar question and this even might be a possible duplicate. I would like to understand the root cause of the error.
Original task was to populate values for replacing NA
in df_b
.
df_a <- data.frame(colA = sample(1:10, 10), colB = sample(LETTERS[1:10],10))
df_b <- data.frame(colA = sample(1:10, 10), colB = sample(c(LETTERS[1:10], 'NA'),10))