How to enable row[col_name] syntax with Namedtuple Pandas from df.itertuples()

Question

We have a DataFrame with many columns and need to cycle through the rows with df.itertuples(). Many column names are in variables and accessing the namedtuple row with getattr() works fine but is not very readable with many column accesses. Is there a way to enable the row[col_name] syntax? E.g with a subclassed NamedTuple like here https://stackoverflow.com/a/65301971/360265?

import pandas as pd

col_name = 'b'

df = pd.DataFrame([{'a': 1, 'b': 2.}, {'a': 3,'b': 4.}])
for row in df.itertuples():
    print(row.a)  # Using row._asdict() would disable this syntax
    print(getattr(row, col_name))  # works fine but is not as readable as row[col_name]
    print(row[col_name]) # how to enable this syntax?

Wrapping row in the following Frame class is a solution but not really a pythonic one.

class Frame:
    def __init__(self, namedtuple: NamedTuple):
        self.namedtuple = namedtuple

    def __getattr__(self, item):
        return getattr(self.namedtuple, item)

    def __getitem__(self, item):
        return getattr(self.namedtuple, item)

What is the problem with row._asdict()[col_name], or even better save it to a new variable and use it? — Dani Mesejo, Dec 27 '20 at 13:22
I think you already found out: `row.a` is not possible with Dict — mpa, Dec 27 '20 at 14:31

Dani Mesejo · Answer 1 · 2020-12-27T13:33:55.530

2

Use to_dict

import pandas as pd

col_name = 'b'

df = pd.DataFrame([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}])
for row in df.to_dict('records'):
    print(row[col_name])

Output

2
4

If you want to keep both notations, a possible approach would be to do:

def iterdicts(tuples):
    yield from ((tup, tup._asdict()) for tup in tuples)


df = pd.DataFrame([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}])
for tup, row in iterdicts(df.itertuples()):
    print(tup.a)
    print(row[col_name])

Output

edited Dec 27 '20 at 13:33

answered Dec 27 '20 at 13:26

Dani Mesejo

61,499
6
49
76

1

both options certainly work, thanks. It's mainly a question about improving readability and I think both are on the same level as the question. – mpa Dec 27 '20 at 14:35

score 1 · Answer 2 · answered Dec 27 '20 at 14:16

1

A similar approach to yours, just using df.iterrows()

import pandas as pd

df = pd.DataFrame([{'a': 1, 'b': 2}, {'a': 3,'b': 4}])
for index, row in df.iterrows():
    print(row.b) 
    print(getattr(row, 'b')) 
    print(row['b'])

These lines were tested using pandas versions 0.20.3 and 1.0.1.

answered Dec 27 '20 at 14:16

gv12

21
3

that solves the question asked. We always have different data types in columns that's why I forgot to specify it in my question (will add it now). That's also the reason we never use iterrows() as it looses the dtypes. – mpa Dec 27 '20 at 14:28

How to enable row[col_name] syntax with Namedtuple Pandas from df.itertuples()

2 Answers2