0

I have two .csv files, A and B. A has 1,000,000 rows and 2 columns; B has 20 rows and 2 columns. They look like this:

A:

Var1   S   
x      2   
y      1   
z      7 
x      2
x      3
z      4

B:

Var1   Var2 
x      a
y      b
z      c

A and B have a column Var1 with 3 variables (x, y, z), but in A they are not unique, whereas in B they are. B has an additional column Var2 with 3 variables (a, b, c). Using Python or R, I would like to create a new file C, by appending a new column Var2 to A with the matches of Var1 between A and B, so I can get something like this:

C:

Var1   S    Var 2
x      2    a
y      1    b
z      7    c
x      2    a
x      3    a
z      4    c

I have been using dictionaries and for loops in Python, but without success. My problem is that if I use "for A:" I only get output info from A; whereas if I use "for B:" I only get output info from B. I know it does not make logical sense, but I would like to do something like: "for A, get matched output info from B".

IMPORTANT: I just realised, through the feedback of users, that the "merge" operation does the job very simply, either with "pandas"(python) or R (dplyr, merge). It was simpler than I thought.

Lucas
  • 1,139
  • 3
  • 11
  • 23

2 Answers2

0

Use following code

import pandas as pd
# create your two dataset here and perform below operation
pd.merge(FIRST_TABLE, SECOND_TABLE, how='left', on=['Var1'])

Hope this helps !

Jay Parikh
  • 2,419
  • 17
  • 13
0

In python's pandas (http://pandas.pydata.org/), you can easily do a merge.

C = A.merge(B, on = "Var1")
Mikk
  • 804
  • 8
  • 23