1

I found three questions giving a basis but I couldn't create a code for this work, I would like some help because I really couldn't create and I think it can be very useful for other people in the future. I'll leave the three questions I tried to work on:

Filter rows in csv file based on another csv file and save the filtered data in a new file
Python how to search value in one csv based on another csv - pandas?
Python: Filtering CSV with conditions from another CSV

Data explanation:

CSV file name "Main.csv":

label,value,market
,,
team1 v team2,match1,market1
team3 v team4,match2,market2
team5 v team6,match3,market3

CSV file name "parameter.csv": used as a parameter for the filter:

time,goals,label,value
,,,
15,4,team1 v team2,match1
10,3,team5 v team6,match3

RULE: If the value of label and value are found exactly together in a row, so be present in the CSV final!

CSV expected and created after filter:

label,value,market
,,
team1 v team2,match1,market1
team5 v team6,match3,market3
rioV8
  • 24,506
  • 3
  • 32
  • 49
Digital Farmer
  • 1,705
  • 5
  • 17
  • 67

3 Answers3

3

Try the following code with Pandas (I have added some comments for explanation):

import pandas as pd

#load the files to dataframes
main = pd.read_csv("Main.csv")
par = pd.read_csv("parameter.csv")

#transform the dataframes to lists of dictionaries
main_dict=main.to_dict(orient='records')
par_dict=par.to_dict(orient='records')

#create a list of dictionaries that use only 'label' and 'value' as keys
par_dict = [{'label':i['label'], 'value':i['value']} for i in par_dict]

#search for records in main that the pair of label-value exists in the previous list
result = [i for i in main_dict if {'label':i['label'], 'value':i['value']} in par_dict]

#change back to dataframe and save to csv
result=pd.DataFrame(result)

result.to_csv('resut.csv', index=False)
IoaTzimas
  • 10,538
  • 2
  • 13
  • 30
1

This can be done easily with pandas by setting the common columns as index to the dataframes. Since you don't care about the other columns in parameter.csv, they can be dropped.

import pandas as pd
common_index = ["label", "value"]
main = pd.read_csv("Main.csv").dropna().set_index(common_index)
param = (pd.read_csv("parameter.csv", usecols=common_index)
    .dropna().set_index(common_index))
result = main[main.index.isin(param.index)]
print(result)

Result

                      market
label         value          
team1 v team2 match1  market1
team5 v team6 match3  market3
tdelaney
  • 73,364
  • 6
  • 83
  • 116
1

This can also be done in the standard library using the csv module. Create a set from the columns of interest in the parameters file and use that as a filter when reading the main file.

import csv
with open("parameter.csv", newline="") as p_file:
    reader = csv.reader(p_file)
    next(reader)
    param_set = {tuple(row[2:4]) for row in reader if row[2]}

with open("Main.csv") as m_file:
    reader = csv.reader(m_file)
    next(reader)
    result = [row for row in reader if tuple(row[0:2]) in param_set]

print(result)
tdelaney
  • 73,364
  • 6
  • 83
  • 116