Create custom objects from CSV rows

Question

I have the following CSV file:

id;area;zz;nc
1;35.66;2490.8;1
2;65.35;2414.93;1
3;79.05;2269.33;1
4;24.5;2807.68;1
5;19.31;2528.59;1
6;25.51;2596.44;1

where each rows represents a so called Cell object with its id, area, zz, cc.

Consequentially, I have created the following class:

class cells():
    #    
    # Initializer / Instance Attributes
    def __init__(self, idm, area,zz,nc):
        self.idm  = idm
        self.area = area

The idea is to create a number of object as the number of cells and to assign to them the attributes according to the data in the file.

The first idea that I have is to read the csv file as a DataFrame and after a list of objects to be populated in a cycle.

As far as I know, python is very inefficient with cycle and I would like to know if there is another way (smart one) to do that.

Thanks, Diego

Is there any particular reason why you need them to be objects of specific class? Would you be alright with using `namedtuples` instead? — AMC, Nov 16 '19 at 00:48
Also, your class should probably be named `Cell` instead of `cells`, since Python classes follow the _CapWords_ naming convention, and each object represents a single cell. — AMC, Nov 16 '19 at 00:54
Do you find any of the current answers satisfactory, are you hoping for new ones? — AMC, Nov 18 '19 at 01:59

waffles · Accepted Answer · 2019-11-16T02:26:45.170

2

I don't quite understand what you mean by cycle, but this will create a list of cell objects for each row that you have - given the format your data is in.

Pandas list comprehension over series is a reasonable option, see https://stackoverflow.com/a/55557758/7582537

Try this:

import pandas as pd 


class Cell():
    # Initializer / Instance Attributes
    def __init__(self, idm, area, zz, nc):
        self.idm  = idm
        self.area = area


def create_cells(row):
    newcell = Cell(row[0], row[1], row[2], row[3])
    return newcell


file = pd.read_table("your_file.csv", sep=';')
zipp = zip(file['id'], file['area'], file['zz'], file['nc'])
cells = [create_cells(row) for row in zipp]

print(cells)

edited Nov 16 '19 at 02:26

answered Nov 16 '19 at 00:57

waffles

208
2
11

Thanks for sharing that post, it was quite informative! – AMC Nov 16 '19 at 02:05
I hesitated for a while. I do agree that using a list comprehension or some other kind of plain Python iteration might make sense here, but I’m not a fan of using `pd.read_table()` and creating an entire function for what is just tuple unpacking in a constructor. In retrospect, I should have commented on those rather than just downvoting, sorry. For what it’s worth I have upvoted you know, since the solution is ultimately correct and generally well-written :) – AMC Nov 16 '19 at 02:36
I think that a key function is "zip". I have to understand it properly. – diedro Nov 16 '19 at 14:13
1

@diedro `zip()` is great, extremely useful function. It's a built-in function, and I think [the docs](https://docs.python.org/3/library/functions.html#zip) do a good job of explaining it. – AMC Nov 16 '19 at 21:08
@AlexanderCécile I also don't agree with this methodology. The only reason I suggested this method is because OP suggested the use of DataFrames and tagged `pandas`. I would definitely recommend your solution to read the file directly into the desired format. Also my answer was incorrect initially anyway, but I edited :) – waffles Nov 18 '19 at 03:05
@uMdRupert I just saw that the older version of your post included a disclaimer about the use of Pandas, you could have kept that, no? Although I imagine you had your reasons to get rid of it. – AMC Nov 18 '19 at 04:41
1

@AlexanderCécile Yeah, I changed it after researching `pd.read_table` - as that is suitable for reading non-csv files as opposed to read_csv. But yes I agree, the premise still holds and there is no reason to use `pandas` here. – waffles Nov 18 '19 at 06:00

AMC · Answer 2 · 2019-11-16T21:24:56.753

uMdRupert shared a link to an interesting post in his answer, I would recommend checking it out!

I like his idea of using a list comprehension, so I wanted to share a similar method:

import pandas as pd


class Cell:
    def __init__(self, idm, area, zz, nc):
        self.idm = idm
        self.area = area


cell_df = pd.read_csv('../resources/test_cell_data.csv', delimiter=';')
cell_df = cell_df.rename({'id': 'idm'}, axis='columns')

cell_objs_lst = [Cell(*curr_tuple._asdict()) for curr_tuple in cell_df.itertuples(index=False)]

Pandas might be overkill for this task, so here is a dead-simple method which uses the csv module:

import csv


class Cell:
    def __init__(self, idm, area, zz, nc):
        self.idm = idm
        self.area = area


with open('../resources/test_cell_data.csv', newline='') as in_file:
    next(in_file)
    reader = csv.DictReader(in_file, fieldnames=['idm', 'area', 'zz', 'nc'], delimiter=';')
    cells_lst = [Cell(**curr_row) for curr_row in reader]

score 1 · Answer 3 · answered Nov 16 '19 at 02:56

I don't think you need pandas in this case. pandas is overkill if you only need to read a csv file.

either read it directly:

objects = []
next(f) # skip header row
with open('your_file', 'r') as f:
    for row in f:
        objects.append(cells(*row.strip().split(';')))

or using csv module.

score 0 · Answer 4 · answered Nov 16 '19 at 01:15

I don't know your purpose of using object Cells for each row of df. However, I think you may achieve it with df.agg and keep every object in a series

class Cells():
    # Initializer / Instance Attributes
    def __init__(self, idm, area, zz, nc):
        self.idm  = idm
        self.area = area
        self.zz = zz
        self.nc = nc

s = df.agg(lambda x: Cells(*x), axis=1)
print(s)

Output:
0    <__main__.Cells object at 0x09FA38D0>
1    <__main__.Cells object at 0x09FA3510>
2    <__main__.Cells object at 0x09FA3870>
3    <__main__.Cells object at 0x09FA3AF0>
4    <__main__.Cells object at 0x09B27790>
5    <__main__.Cells object at 0x09B27770>
dtype: object

After that you may access each object from indexing of s

In [303]: s[0].__dict__
Out[303]: {'idm': 1.0, 'area': 35.66, 'zz': 2490.8, 'nc': 1.0}

In [304]: s[1].__dict__
Out[304]: {'idm': 2.0, 'area': 65.35, 'zz': 2414.93, 'nc': 1.0}

@AlexanderCécile: in the pandas world, `df` is always known as the working dataframe. OP says `The first idea that I have is to read the csv file as a DataFrame`, so I assume he already knew the way to read csv to dataframe. If he doesn't, a simple google would yield him the instruction to read_csv. — Andy L., Nov 16 '19 at 18:52
Right yes i’m familiar with the convention, I guess I was caught off guard because I only glanced at your mention of `df` in the beginning of your post. *facepalm* — AMC, Nov 16 '19 at 20:34

Create custom objects from CSV rows

4 Answers4