Reading CSV files into fields in class init?

Question

I have a python class that read a CSV file and populate the information into separate field in the class.

class DataCSVReader(object):
    def __init__(self):
        self.data_name1 = []
        self.data_name2 = []
            ....
        self.data_nameN = []

    def read_from_csv(self, filename):
        data = np.genfromtxt(filename, delimiter=',', skip_header=1)
        self.data_name1 = data[:, 1:4]
        self.data_name2 = data[:, 4:8]
                 ...
        self.data_nameN = data[:, 4*(N-1):4*N]

The file will work and read the data with no problem. But my number of data field N is rather large therefore my code is very long with not much going on. So my questions are:

Is there a better way to populate the data?
Is there a better way to write the init so that it can elegantly creating a lot of empty lists?

And actually, assigning empty lists to `self.data_nameN` inside `__init__` doesn't look necessary. So you could just remove that. — juanpa.arrivillaga, Apr 27 '17 at 17:18
yes, maybe that is part of my question. if I don't my editor (Pycharm) will be complaining "instance attribute define outside __init__" — SunnyIsaLearner, Apr 27 '17 at 17:20
Yeah, ignore Pycharm. It's a matter of opinion, but some people like to initialize attributes to `None` that will be eventually assigned to in another method (the argument being it acts as a form of documentation for people reading your code), even when that method itself is called within `__init__`. But don't use an empty list, at the very least, use `None`. — juanpa.arrivillaga, Apr 27 '17 at 17:21
Thanks so much for your answer. So may I ask why None is better than empty list? — SunnyIsaLearner, Apr 27 '17 at 17:30
SunnyIsaLearner: It's smaller—like the difference between nothing and an empty container. — martineau, Apr 27 '17 at 17:39
SunnyIsaLearner: is there some reason you don't just move the reading of the files into `__init__()` and dispense with the `read_from_csv()` function? In other words why must initializing and populating the instance need to be done in two steps? — martineau, Apr 27 '17 at 17:45
martineau: it is a good point. I may need to read from another data format later in my code so I will have `read_from_csv_format1`, `read_from_csv_format2` — SunnyIsaLearner, Apr 27 '17 at 17:47

score 2 · Answer 1 · answered Apr 27 '17 at 18:10

For fun, I whipped up a solution that employs the accepted answer's strategy of a nested list for containing the data, but added a __getattr__ method to allow for easy data access using the shorthand self.data_x where x is the column index.

class DataCSVReader(object):
    def __init__(self):
        self.data = []

    def __getattr__(self, name):
        if name.startswith('data_'):
            index = name[5:]
            try:
                index = int(index)
                if index < len(self.data):
                    return self.data[index]
                else:
                    return []
            except ValueError:
                raise AttributeError('index {} needs to be an integer'.format(index))
        return super(DataCSVReader, self).__getattribute__(name)

    def read_from_csv(self, filename):
        raw_data = np.genfromtxt(filename, delimiter=',', skip_header=1)
        self.data = [raw_data[i:i+3] for i in range(0, len(raw_data), 3)]


example = DataCSVReader()
example.read_from_csv('csv_path')
print(example.data_1) # self.data[1], or [] if index is out of range

you answer help me learn a lot. It is exactly what I need. Thanks. — SunnyIsaLearner, Apr 27 '17 at 18:14

score 1 · Accepted Answer · edited May 23 '17 at 12:25

1

You can create a list of lists with a simple list-comprehension, like this:

def read_from_csv(self, filename):
    data = np.genfromtxt(filename, delimiter=',', skip_header=1)
    # create a list of lists size of 4
    self.chunked_data = [data[i:i + 4] for i in xrange(0, len(data), 4)]

After that you can get target chuck with index (note that it starts from 0), dr.data_name1 --> self.chunked_data[0]

edited May 23 '17 at 12:25

Community

1
1

answered Apr 27 '17 at 17:19

VMAtm

27,943
17
79
125

Thanks so much for your answer. It looks much more elegant. However, I need to access the data frequently. So I really want to do something like `dr.data_name1` later on in my code. Are there better ways of doing it? – SunnyIsaLearner Apr 27 '17 at 17:36
use `dr.chunked_data[0]` instead of `dr.data_name1`, note the index change. Updated the answer – VMAtm Apr 27 '17 at 17:38
2

You *can* create a name for the first column, `self.data_0 = self.chunked_data[0]`. Since these will reference the same mutable `list` object, any change in one will be reflected in the other. However, this might cause code clarity concerns. – Jared Goguen Apr 27 '17 at 17:47

Reading CSV files into fields in class __init__?

2 Answers2

Reading CSV files into fields in class init?