2

I have a python class that read a CSV file and populate the information into separate field in the class.

class DataCSVReader(object):
    def __init__(self):
        self.data_name1 = []
        self.data_name2 = []
            ....
        self.data_nameN = []

    def read_from_csv(self, filename):
        data = np.genfromtxt(filename, delimiter=',', skip_header=1)
        self.data_name1 = data[:, 1:4]
        self.data_name2 = data[:, 4:8]
                 ...
        self.data_nameN = data[:, 4*(N-1):4*N]

The file will work and read the data with no problem. But my number of data field N is rather large therefore my code is very long with not much going on. So my questions are:

  1. Is there a better way to populate the data?
  2. Is there a better way to write the init so that it can elegantly creating a lot of empty lists?
martineau
  • 119,623
  • 25
  • 170
  • 301
SunnyIsaLearner
  • 750
  • 2
  • 13
  • 26
  • A list of lists? – Luca Di Liello Apr 27 '17 at 17:16
  • 2
    Use a *container*, like a list or dictionary. – juanpa.arrivillaga Apr 27 '17 at 17:17
  • 3
    And actually, assigning empty lists to `self.data_nameN` inside `__init__` doesn't look necessary. So you could just remove that. – juanpa.arrivillaga Apr 27 '17 at 17:18
  • yes, maybe that is part of my question. if I don't my editor (Pycharm) will be complaining "instance attribute define outside __init__" – SunnyIsaLearner Apr 27 '17 at 17:20
  • 3
    Yeah, ignore Pycharm. It's a matter of opinion, but some people like to initialize attributes to `None` that will be eventually assigned to in another method (the argument being it acts as a form of documentation for people reading your code), even when that method itself is called within `__init__`. But don't use an empty list, at the very least, use `None`. – juanpa.arrivillaga Apr 27 '17 at 17:21
  • Thanks so much for your answer. So may I ask why None is better than empty list? – SunnyIsaLearner Apr 27 '17 at 17:30
  • SunnyIsaLearner: It's smaller—like the difference between nothing and an empty container. – martineau Apr 27 '17 at 17:39
  • SunnyIsaLearner: is there some reason you don't just move the reading of the files into `__init__()` and dispense with the `read_from_csv()` function? In other words why must initializing and populating the instance need to be done in two steps? – martineau Apr 27 '17 at 17:45
  • martineau: it is a good point. I may need to read from another data format later in my code so I will have `read_from_csv_format1`, `read_from_csv_format2` – SunnyIsaLearner Apr 27 '17 at 17:47

2 Answers2

2

For fun, I whipped up a solution that employs the accepted answer's strategy of a nested list for containing the data, but added a __getattr__ method to allow for easy data access using the shorthand self.data_x where x is the column index.

class DataCSVReader(object):
    def __init__(self):
        self.data = []

    def __getattr__(self, name):
        if name.startswith('data_'):
            index = name[5:]
            try:
                index = int(index)
                if index < len(self.data):
                    return self.data[index]
                else:
                    return []
            except ValueError:
                raise AttributeError('index {} needs to be an integer'.format(index))
        return super(DataCSVReader, self).__getattribute__(name)

    def read_from_csv(self, filename):
        raw_data = np.genfromtxt(filename, delimiter=',', skip_header=1)
        self.data = [raw_data[i:i+3] for i in range(0, len(raw_data), 3)]


example = DataCSVReader()
example.read_from_csv('csv_path')
print(example.data_1) # self.data[1], or [] if index is out of range
Jared Goguen
  • 8,772
  • 2
  • 18
  • 36
1

You can create a list of lists with a simple list-comprehension, like this:

def read_from_csv(self, filename):
    data = np.genfromtxt(filename, delimiter=',', skip_header=1)
    # create a list of lists size of 4
    self.chunked_data = [data[i:i + 4] for i in xrange(0, len(data), 4)]

After that you can get target chuck with index (note that it starts from 0), dr.data_name1 --> self.chunked_data[0]

Community
  • 1
  • 1
VMAtm
  • 27,943
  • 17
  • 79
  • 125
  • Thanks so much for your answer. It looks much more elegant. However, I need to access the data frequently. So I really want to do something like `dr.data_name1` later on in my code. Are there better ways of doing it? – SunnyIsaLearner Apr 27 '17 at 17:36
  • use `dr.chunked_data[0]` instead of `dr.data_name1`, note the index change. Updated the answer – VMAtm Apr 27 '17 at 17:38
  • 2
    You *can* create a name for the first column, `self.data_0 = self.chunked_data[0]`. Since these will reference the same mutable `list` object, any change in one will be reflected in the other. However, this might cause code clarity concerns. – Jared Goguen Apr 27 '17 at 17:47