I have two .csv files, A and B. A has 1,000,000 rows and 2 columns; B has 20 rows and 2 columns. They look like this:
A:
Var1 S
x 2
y 1
z 7
x 2
x 3
z 4
B:
Var1 Var2
x a
y b
z c
A and B have a column Var1 with 3 variables (x, y, z), but in A they are not unique, whereas in B they are. B has an additional column Var2 with 3 variables (a, b, c). Using Python or R, I would like to create a new file C, by appending a new column Var2 to A with the matches of Var1 between A and B, so I can get something like this:
C:
Var1 S Var 2
x 2 a
y 1 b
z 7 c
x 2 a
x 3 a
z 4 c
I have been using dictionaries and for loops in Python, but without success. My problem is that if I use "for A:" I only get output info from A; whereas if I use "for B:" I only get output info from B. I know it does not make logical sense, but I would like to do something like: "for A, get matched output info from B".
IMPORTANT: I just realised, through the feedback of users, that the "merge" operation does the job very simply, either with "pandas"(python) or R (dplyr, merge). It was simpler than I thought.