Finding common values between two csv file

Question

I have two csv files (say, a and b) and both contain different datasets. The only common between those two CSV files is id_no. I would like to create a final csv file that contains all the datasets of both CSV files whose id_no are matching.

a looks like

id_no   a1    a2     a3    a4
1       0.5  0.2    0.1    10.20
2       1.5  0.1    0.2    11.25
3       2.5  0.7    0.3    12.90
4       3.5  0.8    0.4    13.19
5       7.5  0.6    0.3    14.21

b looks like

id_no   A1         
6       10.1  
8       2.5 
4       12.5  
2       20.5  
1       2.51

I am looking for a final csv file, say c that shows the following output

id_no   a1    a2     a3    a4       A1
1       0.5  0.2    0.1    10.20    2.51
2       1.5  0.1    0.2    11.25    20.5
3       2.5  0.7    0.3    12.90     0
4       3.5  0.8    0.4    13.19    12.5
5       7.5  0.6    0.3    14.21     0

Related: https://stackoverflow.com/questions/45068584/merge-two-csv-files-into-one-with-pandas-by-id — Nei Neto, Oct 11 '21 at 14:55
you need `inner join` on id column. check https://pandas.pydata.org/docs/reference/api/pandas.merge.html — Epsi95, Oct 11 '21 at 14:55
Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) — Chris, Oct 11 '21 at 15:00

not_speshal · Accepted Answer · 2021-10-11T15:12:29.243

1

Use pandas.merge:

import pandas as pd

a = pd.read_csv("data1.csv")
b = pd.read_csv("data2.csv")

output = a.merge(b, on="id_no", how="left").fillna(0).set_index("id_no")
output.to_csv("output.csv")

>>> output
        a1   a2   a3     a4     A1
id_no                             
1      0.5  0.2  0.1  10.20   2.51
2      1.5  0.1  0.2  11.25  20.50
3      2.5  0.7  0.3  12.90   0.00
4      3.5  0.8  0.4  13.19  12.50
5      7.5  0.6  0.3  14.21   0.00

edited Oct 11 '21 at 15:12

answered Oct 11 '21 at 15:11

not_speshal

22,093
2
15
30

I admit that for once, pandas is more readable than doing it by hand.... – 2e0byo Oct 11 '21 at 15:12
I wouldn't agree with the "for once" haha – not_speshal Oct 11 '21 at 15:12

2e0byo · Answer 2 · 2021-10-11T21:22:05.477

0

Using plain old python:

from csv import reader, writer
from pathlib import Path


with Path("file2.csv").open as f:
    r = reader(f)
    header = next(r)
    data = {k:v for row in r for k, v in [row]}

rows = []
with Path("file1.csv").open() as f:
    r = reader(f)
    header.append(next(r)[-1])
    for i, *row in r:
        if i in data:
            rows.append([i] + row + data[i])
        else:
            rows.append([i] + row + [0])

with Path("file1.csv").open("w") as f:
    w = writer(f)
    w.write_row(header)
    w.write_rows(rows)

edited Oct 11 '21 at 21:22

answered Oct 11 '21 at 15:07

2e0byo

5,305
1
6
26

This uses lots of lines of codes as compared to another ones. – Ankita Oct 11 '21 at 21:17
@AnkitaDebnath I gaurantee it uses few lines of code than pandas does to do the same thing ;) but yes, it's there for completeness: it's worth knowing how to do this kind of thing by hand. If you're using pandas for everything else, you might as well use pandas for this with the nice one-liner above. – 2e0byo Oct 11 '21 at 21:20

Finding common values between two csv file

2 Answers2