7

Got a problem with Pandas in Python 3.5

I read local csv using Pandas, the csv contains pure data, no header involved. Then I assigned column name using

df= pd.read_csv(filePath, header=None)
df.columns=['XXX', 'XXX'] #for short, totally 11 cols

The csv has 11 columns, one of them is string, others are integer.

Then I tried to replace string column by integer value in a loop, cell by cell

for i, row in df.iterrows():
    print(i, row['Name'])
    df.set_value(i, 'Name', 123)

intrger 123 is an example, not every cell under this column is 123. print function works well if I remove set_value, but with

df.set_value(i, 'Name', 123)

Then error info:

Traceback (most recent call last): File "D:/xxx/test.py", line 20, in df.set_value(i, 'Name', 233)

File "E:\Users\XXX\Anaconda3\lib\site-packages\pandas\core\frame.py", line 1862, in set_value series = self._get_item_cache(col)

File "E:\Users\XXX\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1351, in _get_item_cache res = self._box_item_values(item, values)

File "E:\Users\XXX\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2334, in _box_item_values

return self._constructor(values.T, columns=items, index=self.index)

AttributeError: 'BlockManager' object has no attribute 'T'

But if I create a dataframe manually in code

df = pd.DataFrame(index=[0, 1, 2], columns=['x', 'y'])
df['x'] = 2
df['y'] = 'BBB'
print(df)
for i, row in df.iterrows():
    df.set_value(i, 'y', 233)


print('\n')
print(df)

It worked. I am wondering maybe there is something I am missing?

Thanks!

Windtalker
  • 776
  • 4
  • 13
  • 23
  • 1
    why don't you want just to do the following instead of your loop: `df['Name'] = 123`? – MaxU - stand with Ukraine May 30 '16 at 20:59
  • Because not every cell to be 123 – Windtalker May 30 '16 at 21:06
  • But in your code you are setting 123 to the whole `Name` column - could you clarify what are you going to achieve? – MaxU - stand with Ukraine May 30 '16 at 21:14
  • 1
    @MaxU for i, row in df.iterrows(): df.set_value(i, 'y', 233) should update cell row by row. Coz it is executed in a loop – Windtalker May 30 '16 at 21:24
  • did you try `df['Name'] = 123`, where `Name` is the column-name, you want to update??? – MaxU - stand with Ukraine May 30 '16 at 21:34
  • @MaxU Yes I did. I tried first make entire column to be integer 123, then update cell by row in a loop, but got same errors I posted. – Windtalker May 30 '16 at 21:40
  • 2
    Can you post a data set where we could reproduce your error? And beside that - what are you going to achieve? It's still not clear why are you using `set_value()` in loop instead of working with vectorized data sets (like columns)? [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – MaxU - stand with Ukraine May 30 '16 at 21:45
  • Your example works perfectly here, importing from a csv file. There should be no difference in the dataframes if you create them in the code or from a file. It is not a problem from set_value(). At least in pandas 0.18. – Luis May 30 '16 at 21:50
  • For which value of `i` is the error raised? What is the corresponding row? You can print values of `i` and `row` right after the error. But I agree with @MaxU, if you don't have any specific reason to use `set_value`, it might be better to use `.loc` or `at` if speed is the concern. – ayhan May 30 '16 at 22:03
  • @MaxU Say the particular column is about city. My goal is to replace string city name with city ID, to make the dataframe all numerical. I'll try to use Apply to merge two dataframe instead of using set_value. Thanks for the hints – Windtalker May 30 '16 at 22:32
  • @Luis Yea, hard to find where root is.. – Windtalker May 30 '16 at 22:32
  • @ayhan error raised from first time, when i = 0. Yes, print i and row. – Windtalker May 30 '16 at 22:33

2 Answers2

21

The cause of the original error:

Pandas DataFrame set_value(index, col, value) method will return the posted obscure AttributeError: 'BlockManager' object has no attribute 'T' when the dataframe being modified has duplicate column names.

The error can be reproduced using the code above by @Windtalker where the only change made is that the column names are now both 'x' rather than 'x' and 'y'.

import pandas as pd
df = pd.DataFrame(index=[0, 1, 2], columns=['x', 'x'])
df['x'] = 2
df['y'] = 'BBB'
print(df)
for i, row in df.iterrows():
    df.set_value(i, 'y', 233)

print('\n')
print(df)

Hopefully this helps someone else diagnose the same issue.

TheRoman
  • 211
  • 1
  • 4
  • 2
    Very difficult to trace this problem back to the duplicated columns. It's actually a bug, I would say. Have you reported it? – Konstantin Sep 26 '17 at 18:07
0

well, now when you made it lot clearer, it's easier to answer your question...

assuming your DF looks like this:

In [164]: df
Out[164]:
    a   b   c   d   e          city
0   6  55   3  48  11          Kiev
1   5  29  42  95  69        Munich
2  53  79  60  80  89        Berlin
3   6  70  87   6  85      New York
4  97  23  94  43  31         Paris
5  15  17  56  34  77  Zaporizhzhia
6  28  35  58  82  33        Warsaw
7  41  93  60  54  21      Hurghada
8  68  23  80  39  66          Bern
9  15  17  30  26  98          Lviv

and you hasve another DF with city-id's:

In [165]: cities
Out[165]:
              id
city
Warsaw         6
Kiev           0
New York       3
Hurghada       7
Munich         1
Paris          4
Berlin         2
Zaporizhzhia   5
Lviv           9
Bern           8

you can map city to city-id like this:

In [168]: df['city_id'] = df['city'].map(cities['id'])

In [169]: df
Out[169]:
    a   b   c   d   e          city  city_id
0   6  55   3  48  11          Kiev        0
1   5  29  42  95  69        Munich        1
2  53  79  60  80  89        Berlin        2
3   6  70  87   6  85      New York        3
4  97  23  94  43  31         Paris        4
5  15  17  56  34  77  Zaporizhzhia        5
6  28  35  58  82  33        Warsaw        6
7  41  93  60  54  21      Hurghada        7
8  68  23  80  39  66          Bern        8
9  15  17  30  26  98          Lviv        9

PS when working with Pandas in 95% you don't really need to loop through your DF's in order to achieve your goals

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419