2

I have a CSV file that has 3 columns. Let's say: a, b, c. I'm using csv.dictReader to read it and add another column that has just the name of the file on each row.

This is my function:

def addFilename(self):
    with open(self.datafile, "r") as f:
        reader = csv.DictReader(f, delimiter='|')
        for a, b, c in reader:
            #Get filename
            filename = self.getFilename()
            yield {
                "_source": {
                    "a": a,
                    "b": b,
                    "c": c,
                    "filename": filename
                }
            }

Now I'd like to generalize that behavior for many different CSV files. Those files have different number of columns and different column names. Is there a way to do so?

I don't want to modify the CSV file. The only thing I know is that I can get the fieldnames (and the number of fields) using reader.fieldnames, but I don't know how I could use that in a yield.

6659081
  • 381
  • 7
  • 21

2 Answers2

2

This question may provide some useful insights:

Essentially you could do something like this:

def foo(fname):
    with open(fname, "r") as f:
        reader = csv.DictReader(f, delimiter='|')
        for row in reader:
            yield { "_source": dict(filename=fname, **row) }
moooeeeep
  • 31,622
  • 22
  • 98
  • 187
1

You do not have to iterate on the level of column names. You can treat all of the existing column values returned by for ... in reader as a tuple. Then:

   def addFilename2(self):
    with open(self.datafile, "r") as f:
        reader = csv.DictReader(f, delimiter='|')
        for column_dict in reader:
            #Get filename
            filename = self.getFilename()
            mapped_values =list(column_dict.items())
            mapped_values.append(("filename", filename,))
            yield {
                "_source": dict(mapped_values)
                }

This approach is oblivious to the names and number of columns in your CSV file and will always add filename as the last column.

Dictionary can be created from a list of 2-tuples which we create based on the column names and values from a particular row using zip. Having a list we can append the repeated filename and call dictionary constructor so that we can return a dictionary with the additional column.

sophros
  • 14,672
  • 11
  • 46
  • 75