Python: How to set the variables of a class, based on a lookup to the column headers of a CSV file

Question

I have a class ETF that has many variables. I just included three below for simplicity but there are actually close to 40:

class ETF:
    def __init__(self, symbol, name, asset_class):
        self.symbol = symbol
        self.name = name
        self.asset_class = asset_class

There is another file in my project with the following code. The two #CODE NEEDED HERE comments are where my question pertains to.

import csv

# Open the file
data = open('db.csv')
csv_data = csv.reader(data) # csv.reader

# reformat it into a python object list of lists
data_lines = list(csv_data)

headers = data_lines[1] # Retrieving the column headers

# Find the Index positions in headers for each ETF class attribute
#CODE NEEDED HERE

# create ETF objects for each line in the file
for line in data_lines[2:]:
    # CODE NEEDED HERE
    # Lookup the column header based on the

I also have two spreadsheets. One spreadsheet is called db.csv and contains the information we will be using to create ETF objects. Each row in this csv will be it's own ETF object. The column headers on the CSV file do do not exactly match the variable names in the ETF class and not every column is used. For that reason, I have a second spreadsheet called column_reference.csv which I will use to map the column names in db.csv to the ETF variable names.

See table below for an example of the column_reference.csv file:

Please see the image below as an example of the db.csv file:

What code would you use to most efficiently map the column headers and create ETF objects.

score 0 · Answer 1 · answered Sep 21 '20 at 19:53

0

Use pandas to create a dataframe out of the csv and df.iterrows() to iterate over the rows and initialize objects by them. By manipulating the df.columns attribute you can set your custom column names.

answered Sep 21 '20 at 19:53

Yannick Funk

1,319
10
23

thank you for the suggestion, I'm looking into this now – FinDev Sep 21 '20 at 20:02
@FinDev If you think my answer is satisfactory, accept it as best answer – Yannick Funk Sep 21 '20 at 20:03
I will accept the answer shortly if I can get the solution working – FinDev Sep 21 '20 at 20:31
what would you say to this post suggestion that iterating on Panda dataframes should not be used due to inefficiency? https://stackoverflow.com/a/55557758/12963030 – FinDev Sep 21 '20 at 20:51

gañañufla · Answer 2 · 2020-09-21T21:22:53.840

0

This is the "Pythonic way":

columns = open('column_reference.csv')
csv_columns = csv.reader(columns) 

columns_dict = {}

for column in csv_columns:
    columns_dict[column[0]] = column[1]

for line in data_lines[2:]:
    values = {}
    for key in columns_dict.keys():
        p_index = headers.index(key)
        values[key] = line[p_index]
        ETF(**values)

edited Sep 21 '20 at 21:22

answered Sep 21 '20 at 19:59

gañañufla

552
2
12

there are about 40 variables in the ```ETF``` class, I'm not sure it makes sense to hard code this way given the number of variables. What do you think? – FinDev Sep 21 '20 at 20:01
I edit the answer, a way to make this is using dictionaries – gañañufla Sep 21 '20 at 20:19
would you mind adding comments so I can understand your code better? I'm getting the following error message due to a lack of instantiation of the ```ETF``` object: ```TypeError: __init__() missing 30 required positional arguments``` – FinDev Sep 21 '20 at 21:05
I edit the answer, the purpose its save the properties in a dict and create an ETF(**values) – gañañufla Sep 22 '20 at 16:38

FinDev · Accepted Answer · 2020-09-23T19:28:03.870

I ended up using a series of nested for loops to create lists of each CSV row to accomplish this in the shortest amount of time possible. The pandas solution was too time consuming

import csv
from ETF import ETF


# Open the file
data = open('db.csv')
csv_data = csv.reader(data) # csv.reader

# reformat it into a python object list of lists
data_lines = list(csv_data)
print(type(data_lines))


# Creating a hash map of the column_reference.csv file
name_map = []
with open('column_reference.csv') as f:
    for line in f:
        tokens = line.split(',')
        old = tokens[0]
        new = tokens[1]
        name_map.append([old, new])

# Retrieving the column headers of the database file
counter = -1
for i in data_lines[1]:
    counter = counter + 1
    for j in name_map:
        if j[0] == i:
            j.append(counter)

# Creating ETF objects based on the indexes of the columns in the database
for line in data_lines[2:]:
    # Lookup the column header based on the
    etf_characteristics = []
    for i in name_map:
        etf_characteristics.append(line[i[2]])
    this_etf = ETF(*etf_characteristics)

Python: How to set the variables of a class, based on a lookup to the column headers of a CSV file

3 Answers3