Get data from Pandas DataFrame using column values

Question

>>> df = pd.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},
...                   index=['dog', 'hawk'])
>>> df
      num_legs  num_wings
dog          4          0
hawk         2          2
>>> for row in df.itertuples():
...     print(row)
...
Pandas(Index='dog', num_legs=4, num_wings=0)
Pandas(Index='hawk', num_legs=2, num_wings=2)

I am parsing an excel sheet using pandas.DataFrame.itertuples which will give me a pandas.DataFrame over each iteration. Consider the pandas.DataFrame returned in each iteration as shown above.

Now off the each data frame Pandas(Index='dog', num_legs=4, num_wings=0) I would like to access the values using the keyword num_legs however upon using the same I get the below exception.

TypeError: tuple indices must be integers, not str

Could someone help on how to retrieve the data from the data frames using the column headers directly.

score 4 · Answer 1 · answered Feb 19 '19 at 11:31

4

I faced the same error when using a variable.

v = 'num_legs'
for row in df.itertuples():
    print(row[v])

TypeError: tuple indices must be integers or slices, not str

To use df.itertuples() and use the attribute name as a variable.

v = 'num_legs'
for row in df.itertuples():
    print(getattr(row, v))

At the end df.itertuples() is faster than df.iterrows().

answered Feb 19 '19 at 11:31

Mohit Musaddi

143
1
8

1

How did you evaluated to conclude that `itertuples` is faster than `iterrows` ? – Krishna Oza Feb 19 '19 at 12:39
You can check this [link](https://realpython.com/fast-flexible-pandas/#looping-with-itertuples-and-iterrows) and last week I have tested the same with large dataframes. – Mohit Musaddi Feb 19 '19 at 13:27
And also [this](https://stackoverflow.com/questions/24870953/does-iterrows-have-performance-issues) – Mohit Musaddi Feb 19 '19 at 14:27

meW · Accepted Answer · 2019-02-19T11:09:36.293

1

Here:

for row in df.itertuples():
    print(row.num_legs)
  # print(row.num_wings)   # Other column values

# Output
4
2

edited Feb 19 '19 at 11:09

answered Feb 19 '19 at 11:07

meW

3,832
7
27

accepting this since I was using itertuples to iterate over data frames. – Krishna Oza Feb 19 '19 at 12:35
I tried to use the same when reading a csv using `read_csv` however my first row after comments in csv is not being treated as column names and I get exception while using `row["columnHeader"]` – Krishna Oza Feb 22 '19 at 06:35
While that's a separate question which you should raise, but as a hint play with `header` argument. – meW Feb 22 '19 at 06:36
Tried to use `header` argument , unfortunately the csv have extra column data apart from column header and hence upon using the `header` argument the parsing fails – Krishna Oza Feb 22 '19 at 07:16
@darth_coder Then I suggest you should ask a separate question, by listing only this problem with proper explanation. – meW Feb 22 '19 at 07:18

score 1 · Answer 3 · answered Feb 19 '19 at 11:13

1

you could use iterrows(),

for u,row in df.iterrows():
    print(u)
    print (row)
    print (row['num_legs'])

O/P:

dog
num_legs     4
num_wings    0
Name: dog, dtype: int64
4
hawk
num_legs     2
num_wings    2
Name: hawk, dtype: int64
2

answered Feb 19 '19 at 11:13

Mohamed Thasin ah

10,754
11
52
111

This answer is also correct and I would now use `iterrows` while coding rather than `itertuples` since the way data is accessed mimics array index operator. – Krishna Oza Feb 19 '19 at 12:36

Get data from Pandas DataFrame using column values

3 Answers3