Python. Using for loops and dicts: for rows in file A: get matched output from file B

Question

I have two .csv files, A and B. A has 1,000,000 rows and 2 columns; B has 20 rows and 2 columns. They look like this:

A:

B:

Var1   Var2 
x      a
y      b
z      c

A and B have a column Var1 with 3 variables (x, y, z), but in A they are not unique, whereas in B they are. B has an additional column Var2 with 3 variables (a, b, c). Using Python or R, I would like to create a new file C, by appending a new column Var2 to A with the matches of Var1 between A and B, so I can get something like this:

C:

Var1   S    Var 2
x      2    a
y      1    b
z      7    c
x      2    a
x      3    a
z      4    c

I have been using dictionaries and for loops in Python, but without success. My problem is that if I use "for A:" I only get output info from A; whereas if I use "for B:" I only get output info from B. I know it does not make logical sense, but I would like to do something like: "for A, get matched output info from B".

IMPORTANT: I just realised, through the feedback of users, that the "merge" operation does the job very simply, either with "pandas"(python) or R (dplyr, merge). It was simpler than I thought.

Just do a `merge(A, B)` or `library(dplyr); left_join(A,B)` in `R` — akrun, Jan 26 '17 at 15:04

score 0 · Answer 1 · answered Jan 26 '17 at 15:14

0

Use following code

import pandas as pd
# create your two dataset here and perform below operation
pd.merge(FIRST_TABLE, SECOND_TABLE, how='left', on=['Var1'])

Hope this helps !

answered Jan 26 '17 at 15:14

Jay Parikh

2,419
17
13

score 0 · Accepted Answer · answered Jan 26 '17 at 15:14

0

In python's pandas (http://pandas.pydata.org/), you can easily do a merge.

C = A.merge(B, on = "Var1")

answered Jan 26 '17 at 15:14

Mikk

804
8
23

Python. Using for loops and dicts: for rows in file A: get matched output from file B

2 Answers2