1

I have the following CSV file:

id;area;zz;nc
1;35.66;2490.8;1
2;65.35;2414.93;1
3;79.05;2269.33;1
4;24.5;2807.68;1
5;19.31;2528.59;1
6;25.51;2596.44;1

where each rows represents a so called Cell object with its id, area, zz, cc.

Consequentially, I have created the following class:

class cells():
    #    
    # Initializer / Instance Attributes
    def __init__(self, idm, area,zz,nc):
        self.idm  = idm
        self.area = area

The idea is to create a number of object as the number of cells and to assign to them the attributes according to the data in the file.

The first idea that I have is to read the csv file as a DataFrame and after a list of objects to be populated in a cycle.

As far as I know, python is very inefficient with cycle and I would like to know if there is another way (smart one) to do that.

Thanks, Diego

AMC
  • 2,642
  • 7
  • 13
  • 35
diedro
  • 511
  • 1
  • 3
  • 15
  • What is your expected output? – Chris Nov 16 '19 at 00:44
  • Is there any particular reason why you need them to be objects of specific class? Would you be alright with using `namedtuples` instead? – AMC Nov 16 '19 at 00:48
  • 1
    Also, your class should probably be named `Cell` instead of `cells`, since Python classes follow the _CapWords_ naming convention, and each object represents a single cell. – AMC Nov 16 '19 at 00:54
  • Do you find any of the current answers satisfactory, are you hoping for new ones? – AMC Nov 18 '19 at 01:59

4 Answers4

2

I don't quite understand what you mean by cycle, but this will create a list of cell objects for each row that you have - given the format your data is in.

Pandas list comprehension over series is a reasonable option, see https://stackoverflow.com/a/55557758/7582537

Try this:

import pandas as pd 


class Cell():
    # Initializer / Instance Attributes
    def __init__(self, idm, area, zz, nc):
        self.idm  = idm
        self.area = area


def create_cells(row):
    newcell = Cell(row[0], row[1], row[2], row[3])
    return newcell


file = pd.read_table("your_file.csv", sep=';')
zipp = zip(file['id'], file['area'], file['zz'], file['nc'])
cells = [create_cells(row) for row in zipp]

print(cells)
waffles
  • 208
  • 2
  • 11
  • Thanks for sharing that post, it was quite informative! – AMC Nov 16 '19 at 02:05
  • I hesitated for a while. I do agree that using a list comprehension or some other kind of plain Python iteration might make sense here, but I’m not a fan of using `pd.read_table()` and creating an entire function for what is just tuple unpacking in a constructor. In retrospect, I should have commented on those rather than just downvoting, sorry. For what it’s worth I have upvoted you know, since the solution is ultimately correct and generally well-written :) – AMC Nov 16 '19 at 02:36
  • I think that a key function is "zip". I have to understand it properly. – diedro Nov 16 '19 at 14:13
  • 1
    @diedro `zip()` is great, extremely useful function. It's a built-in function, and I think [the docs](https://docs.python.org/3/library/functions.html#zip) do a good job of explaining it. – AMC Nov 16 '19 at 21:08
  • @AlexanderCécile I also don't agree with this methodology. The only reason I suggested this method is because OP suggested the use of DataFrames and tagged `pandas`. I would definitely recommend your solution to read the file directly into the desired format. Also my answer was incorrect initially anyway, but I edited :) – waffles Nov 18 '19 at 03:05
  • @uMdRupert I just saw that the older version of your post included a disclaimer about the use of Pandas, you could have kept that, no? Although I imagine you had your reasons to get rid of it. – AMC Nov 18 '19 at 04:41
  • 1
    @AlexanderCécile Yeah, I changed it after researching `pd.read_table` - as that is suitable for reading non-csv files as opposed to read_csv. But yes I agree, the premise still holds and there is no reason to use `pandas` here. – waffles Nov 18 '19 at 06:00
1

uMdRupert shared a link to an interesting post in his answer, I would recommend checking it out!


I like his idea of using a list comprehension, so I wanted to share a similar method:

import pandas as pd


class Cell:
    def __init__(self, idm, area, zz, nc):
        self.idm = idm
        self.area = area


cell_df = pd.read_csv('../resources/test_cell_data.csv', delimiter=';')
cell_df = cell_df.rename({'id': 'idm'}, axis='columns')

cell_objs_lst = [Cell(*curr_tuple._asdict()) for curr_tuple in cell_df.itertuples(index=False)]

Pandas might be overkill for this task, so here is a dead-simple method which uses the csv module:

import csv


class Cell:
    def __init__(self, idm, area, zz, nc):
        self.idm = idm
        self.area = area


with open('../resources/test_cell_data.csv', newline='') as in_file:
    next(in_file)
    reader = csv.DictReader(in_file, fieldnames=['idm', 'area', 'zz', 'nc'], delimiter=';')
    cells_lst = [Cell(**curr_row) for curr_row in reader]
AMC
  • 2,642
  • 7
  • 13
  • 35
1

I don't think you need pandas in this case. pandas is overkill if you only need to read a csv file.

either read it directly:

objects = []
next(f) # skip header row
with open('your_file', 'r') as f:
    for row in f:
        objects.append(cells(*row.strip().split(';')))

or using csv module.

hunzter
  • 554
  • 4
  • 11
0

I don't know your purpose of using object Cells for each row of df. However, I think you may achieve it with df.agg and keep every object in a series

class Cells():
    # Initializer / Instance Attributes
    def __init__(self, idm, area, zz, nc):
        self.idm  = idm
        self.area = area
        self.zz = zz
        self.nc = nc

s = df.agg(lambda x: Cells(*x), axis=1)
print(s)

Output:
0    <__main__.Cells object at 0x09FA38D0>
1    <__main__.Cells object at 0x09FA3510>
2    <__main__.Cells object at 0x09FA3870>
3    <__main__.Cells object at 0x09FA3AF0>
4    <__main__.Cells object at 0x09B27790>
5    <__main__.Cells object at 0x09B27770>
dtype: object

After that you may access each object from indexing of s

In [303]: s[0].__dict__
Out[303]: {'idm': 1.0, 'area': 35.66, 'zz': 2490.8, 'nc': 1.0}

In [304]: s[1].__dict__
Out[304]: {'idm': 2.0, 'area': 65.35, 'zz': 2414.93, 'nc': 1.0}
Andy L.
  • 24,909
  • 4
  • 17
  • 29
  • Where does `df` come from? – AMC Nov 16 '19 at 10:44
  • @AlexanderCécile: in the pandas world, `df` is always known as the working dataframe. OP says `The first idea that I have is to read the csv file as a DataFrame`, so I assume he already knew the way to read csv to dataframe. If he doesn't, a simple google would yield him the instruction to read_csv. – Andy L. Nov 16 '19 at 18:52
  • 1
    Right yes i’m familiar with the convention, I guess I was caught off guard because I only glanced at your mention of `df` in the beginning of your post. *facepalm* – AMC Nov 16 '19 at 20:34