2

We have a DataFrame with many columns and need to cycle through the rows with df.itertuples(). Many column names are in variables and accessing the namedtuple row with getattr() works fine but is not very readable with many column accesses. Is there a way to enable the row[col_name] syntax? E.g with a subclassed NamedTuple like here https://stackoverflow.com/a/65301971/360265?

import pandas as pd

col_name = 'b'

df = pd.DataFrame([{'a': 1, 'b': 2.}, {'a': 3,'b': 4.}])
for row in df.itertuples():
    print(row.a)  # Using row._asdict() would disable this syntax
    print(getattr(row, col_name))  # works fine but is not as readable as row[col_name]
    print(row[col_name]) # how to enable this syntax?

Wrapping row in the following Frame class is a solution but not really a pythonic one.

class Frame:
    def __init__(self, namedtuple: NamedTuple):
        self.namedtuple = namedtuple

    def __getattr__(self, item):
        return getattr(self.namedtuple, item)

    def __getitem__(self, item):
        return getattr(self.namedtuple, item)
mpa
  • 68
  • 6

2 Answers2

2

Use to_dict

import pandas as pd

col_name = 'b'

df = pd.DataFrame([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}])
for row in df.to_dict('records'):
    print(row[col_name])

Output

2
4

If you want to keep both notations, a possible approach would be to do:

def iterdicts(tuples):
    yield from ((tup, tup._asdict()) for tup in tuples)


df = pd.DataFrame([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}])
for tup, row in iterdicts(df.itertuples()):
    print(tup.a)
    print(row[col_name])

Output

1
2
3
4
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • 1
    both options certainly work, thanks. It's mainly a question about improving readability and I think both are on the same level as the question. – mpa Dec 27 '20 at 14:35
1

A similar approach to yours, just using df.iterrows()

import pandas as pd

df = pd.DataFrame([{'a': 1, 'b': 2}, {'a': 3,'b': 4}])
for index, row in df.iterrows():
    print(row.b) 
    print(getattr(row, 'b')) 
    print(row['b']) 

These lines were tested using pandas versions 0.20.3 and 1.0.1.

gv12
  • 21
  • 3
  • that solves the question asked. We always have different data types in columns that's why I forgot to specify it in my question (will add it now). That's also the reason we never use iterrows() as it looses the dtypes. – mpa Dec 27 '20 at 14:28