Adding a column to dictReader without modifying the CSV file

Question

I have a CSV file that has 3 columns. Let's say: a, b, c. I'm using csv.dictReader to read it and add another column that has just the name of the file on each row.

This is my function:

def addFilename(self):
    with open(self.datafile, "r") as f:
        reader = csv.DictReader(f, delimiter='|')
        for a, b, c in reader:
            #Get filename
            filename = self.getFilename()
            yield {
                "_source": {
                    "a": a,
                    "b": b,
                    "c": c,
                    "filename": filename
                }
            }

Now I'd like to generalize that behavior for many different CSV files. Those files have different number of columns and different column names. Is there a way to do so?

I don't want to modify the CSV file. The only thing I know is that I can get the fieldnames (and the number of fields) using reader.fieldnames, but I don't know how I could use that in a yield.

score 2 · Answer 1 · answered Feb 21 '18 at 15:30

This question may provide some useful insights:

What does `**` mean in the expression `dict(d1, **d2)`?

Essentially you could do something like this:

def foo(fname):
    with open(fname, "r") as f:
        reader = csv.DictReader(f, delimiter='|')
        for row in reader:
            yield { "_source": dict(filename=fname, **row) }

sophros · Answer 2 · 2018-02-21T17:52:48.293

You do not have to iterate on the level of column names. You can treat all of the existing column values returned by for ... in reader as a tuple. Then:

   def addFilename2(self):
    with open(self.datafile, "r") as f:
        reader = csv.DictReader(f, delimiter='|')
        for column_dict in reader:
            #Get filename
            filename = self.getFilename()
            mapped_values =list(column_dict.items())
            mapped_values.append(("filename", filename,))
            yield {
                "_source": dict(mapped_values)
                }

This approach is oblivious to the names and number of columns in your CSV file and will always add filename as the last column.

Dictionary can be created from a list of 2-tuples which we create based on the column names and values from a particular row using zip. Having a list we can append the repeated filename and call dictionary constructor so that we can return a dictionary with the additional column.

Adding a column to dictReader without modifying the CSV file

2 Answers2