2

I'm trying to make a namedtuple from a DictReader object. My code looks like this. The problem I'm struggling with is I have some really long and ugly column headers in the csv file I'm working with. For the sake of this example, one of the column headers I am working with is:

"What is typically the main dish at your Thanksgiving dinner?".

What is throwing me off is there are a bunch of spaces in this title, so if I understand correctly, the namedtuple thinks these are all arguments. What way would you recommend to solve this? I have referenced several threads and feel like I almost got there through this one: What is the pythonic way to read CSV file data as rows of namedtuples?

I am just using one column header as an example. Here is some code I have so far:

import csv
import collections

filename = 'thanksgiving2015.csv'
with open(filename, 'r', encoding = 'utf-8') as f:
    reader = csv.DictReader(f)
    columns = collections.namedtuple('columns', 
    'What is typically the main dish at your 
    Thanksgiving dinner?')

Should I strip all these column headers of their spaces before making the namedtuple? I could do this before I even import the csv in excel, but I assume there is a nice solution in python.

Erich Purpur
  • 1,337
  • 3
  • 14
  • 34

2 Answers2

1

namedtuple treats a single string as a white-space-delimited list of field names. You need to pass an explicit list of column names instead.

namedtuple('columns', ['What is...', 'some other absurd column name'])

I would rethink using the header values directly as field names, though. Ignore the header, and pass a list of shorter names that you can use as attributes later.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • How would I pass a list of shorter names that I can use as attributes later? – Erich Purpur Oct 11 '18 at 19:41
  • I mean something like `Columns = namedtuple('columns', 'foo bar bar')`, so that later when you have something like `x = Columns(...)`, you can write `x.foo`, `x.bar`, etc. `x.'What is typically the main dish at your Thanksgiving dinner?'` isn't syntactically valid. – chepner Oct 11 '18 at 19:51
  • Yes, I have also been trying to do this, but have tried an failed as well. Can you give me a quick example of how to do it? – Erich Purpur Oct 12 '18 at 12:07
  • I did; it's the very first piece of code in my comment. Pick the names you want, then pass either a list of those names or a single whitespace-delimited string as the second argument. – chepner Oct 12 '18 at 12:43
1

As chepner pointed out, the second argument of nametuple() can either be a space-separated string or a list of strings like:

columns = collections.namedtuple('columns', 
    ['What is typically the main dish at your Thanksgiving dinner?', 'other column'])

However, doing so will fail with:

ValueError: Type names and field names must be valid identifiers

This is because columns (which you should capitalize as Columns) will be an object with 'What is typically...' as an identifier and identifiers can't have spaces. To be clear, you would use it as:

Columns = namedtuple('columns', ['what is', 'this'])
columns = Columns('foo', 'bar')
print(columns.this) #  Works fine
print(columns.what is) #  Not going to work

If you were using a simple dict(), you would write:

print(columns['what is'])

You can however ask namedtuple to rename invalid identifiers:

Columns = namedtuple('columns', ['what is', 'this'], rename=True)
print(columns._0)  # ugly but valid
print(columns.this)
Eric Darchis
  • 24,537
  • 4
  • 28
  • 49