Filter rows from a CSV file if row values are found in another CSV file

Question

I found three questions giving a basis but I couldn't create a code for this work, I would like some help because I really couldn't create and I think it can be very useful for other people in the future. I'll leave the three questions I tried to work on:

Filter rows in csv file based on another csv file and save the filtered data in a new file
Python how to search value in one csv based on another csv - pandas?
Python: Filtering CSV with conditions from another CSV

Data explanation:

CSV file name "Main.csv":

label,value,market
,,
team1 v team2,match1,market1
team3 v team4,match2,market2
team5 v team6,match3,market3

CSV file name "parameter.csv": used as a parameter for the filter:

time,goals,label,value
,,,
15,4,team1 v team2,match1
10,3,team5 v team6,match3

RULE: If the value of label and value are found exactly together in a row, so be present in the CSV final!

CSV expected and created after filter:

label,value,market
,,
team1 v team2,match1,market1
team5 v team6,match3,market3

If the values as switched, is it still a match? `team1 v team2` matches `team2 v team1`? — tdelaney, Aug 29 '21 at 15:38
Hi @tdelaney No, it needs to be perfectly the value, totally equal. — Digital Farmer, Aug 29 '21 at 15:40
if value is ```team1 v team2``` then it should only match if the value of the other csv is ```team1 v team2``` — Digital Farmer, Aug 29 '21 at 15:42

IoaTzimas · Accepted Answer · 2021-08-29T15:59:05.110

3

Try the following code with Pandas (I have added some comments for explanation):

import pandas as pd

#load the files to dataframes
main = pd.read_csv("Main.csv")
par = pd.read_csv("parameter.csv")

#transform the dataframes to lists of dictionaries
main_dict=main.to_dict(orient='records')
par_dict=par.to_dict(orient='records')

#create a list of dictionaries that use only 'label' and 'value' as keys
par_dict = [{'label':i['label'], 'value':i['value']} for i in par_dict]

#search for records in main that the pair of label-value exists in the previous list
result = [i for i in main_dict if {'label':i['label'], 'value':i['value']} in par_dict]

#change back to dataframe and save to csv
result=pd.DataFrame(result)

result.to_csv('resut.csv', index=False)

edited Aug 29 '21 at 15:59

answered Aug 29 '21 at 15:55

IoaTzimas

10,538
2
13
30

I have to thank you enormously for all the work you were willing to do to help me! – Digital Farmer Aug 29 '21 at 15:57
1

Welcome, happy to help :) – IoaTzimas Aug 29 '21 at 16:02

tdelaney · Answer 2 · 2021-08-29T16:50:21.990

1

This can be done easily with pandas by setting the common columns as index to the dataframes. Since you don't care about the other columns in parameter.csv, they can be dropped.

import pandas as pd
common_index = ["label", "value"]
main = pd.read_csv("Main.csv").dropna().set_index(common_index)
param = (pd.read_csv("parameter.csv", usecols=common_index)
    .dropna().set_index(common_index))
result = main[main.index.isin(param.index)]
print(result)

Result

                      market
label         value          
team1 v team2 match1  market1
team5 v team6 match3  market3

edited Aug 29 '21 at 16:50

answered Aug 29 '21 at 16:01

tdelaney

73,364
6
83
116

1

It must match value column too – IoaTzimas Aug 29 '21 at 16:02
1

@IoaTzimas - missed that, will fix. – tdelaney Aug 29 '21 at 16:11
1

@IoaTzimas - okay, using a mutliindex instead. – tdelaney Aug 29 '21 at 16:50

score 1 · Answer 3 · answered Aug 29 '21 at 16:54

This can also be done in the standard library using the csv module. Create a set from the columns of interest in the parameters file and use that as a filter when reading the main file.

import csv
with open("parameter.csv", newline="") as p_file:
    reader = csv.reader(p_file)
    next(reader)
    param_set = {tuple(row[2:4]) for row in reader if row[2]}

with open("Main.csv") as m_file:
    reader = csv.reader(m_file)
    next(reader)
    result = [row for row in reader if tuple(row[0:2]) in param_set]

print(result)

Filter rows from a CSV file if row values are found in another CSV file

3 Answers3