10

I have two index-related questions on Python Pandas dataframes.

import pandas as pd
import numpy as np
df = pd.DataFrame({'id' : range(1,9),
                'B' : ['one', 'one', 'two', 'three',
                       'two', 'three', 'one', 'two'],
                'amount' : np.random.randn(8)})

df = df.ix[df.B != 'three'] # remove where B = three
df.index
>>  Int64Index([0, 1, 2, 4, 6, 7], dtype=int64) # the original index is preserved.

1) I do not understand why the indexing is not automatically updated after I modify the dataframe. Is there a way to automatically update the indexing while modifying a dataframe? If not, what is the most efficient manual way to do this?

2) I want to be able to set the B column of the 5th element of df to 'three'. But df.iloc[5]['B'] = 'three' does not do that. I checked on the manual but it does not cover how to change a specific cell value accessed by location.

If I were accessing by row name, I could do: df.loc[5,'B'] = 'three' but I don't know what the index access equivalent is.

P.S. Link1 and link2 are relevant answers to my second question. However, they do not answer my question.

Community
  • 1
  • 1
Zhubarb
  • 11,432
  • 18
  • 75
  • 114

2 Answers2

12

1) I do not understand why the indexing is not automatically updated after I modify the dataframe.

If you want to reset the index after removing/adding rows you can do this:

df = df[df.B != 'three'] # remove where B = three
df.reset_index(drop=True)

       B    amount  id
0    one    -1.176137    1
1    one     0.434470    2
2    two    -0.887526    3
3    two     0.126969    5
4    one     0.090442    7
5    two    -1.511353    8

Indexes are meant to label/tag/id a row... so you might think about making your 'id' column the index, and then you'll appreciate that Pandas doesn't 'automatically update' the index when deleting rows.

df.set_index('id')

       B    amount
id      
1    one    -0.410671
2    one     0.092931
3    two    -0.100324
4    three   0.322580
5    two    -0.546932
6    three  -2.018198
7    one    -0.459551
8    two     1.254597

2) I want to be able to set the B column of the 5th element of df to 'three'. But df.iloc[5]['B'] = 'three' does not do that. I checked on the manual but it does not cover how to change a specific cell value accessed by location.

Jeff already answered this...

Brian Wylie
  • 2,347
  • 28
  • 29
6
In [5]: df = pd.DataFrame({'id' : range(1,9),
   ...:                 'B' : ['one', 'one', 'two', 'three',
   ...:                        'two', 'three', 'one', 'two'],
   ...:                 'amount' : np.random.randn(8)})

In [6]: df
Out[6]: 
       B    amount  id
0    one -1.236735   1
1    one -0.427070   2
2    two -2.330888   3
3  three -0.654062   4
4    two  0.587660   5
5  three -0.719589   6
6    one  0.860739   7
7    two -2.041390   8

[8 rows x 3 columns]

Your question 1) your code above is correct (see @Briford Wylie for resetting the index, which is what I think you want)

In [7]: df.ix[df.B!='three']
Out[7]: 
     B    amount  id
0  one -1.236735   1
1  one -0.427070   2
2  two -2.330888   3
4  two  0.587660   5
6  one  0.860739   7
7  two -2.041390   8

[6 rows x 3 columns]

In [8]: df = df.ix[df.B!='three']

In [9]: df.index
Out[9]: Int64Index([0, 1, 2, 4, 6, 7], dtype='int64')

In [10]: df.iloc[5]
Out[10]: 
B             two
amount   -2.04139
id              8
Name: 7, dtype: object

Question 2):

You are trying to set a copy; In 0.13 this will raise/warn. see here

In [11]: df.iloc[5]['B'] = 5
/usr/local/bin/ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.

In [24]: df.iloc[5,df.columns.get_indexer(['B'])] = 'foo'

In [25]: df
Out[25]: 
     B    amount  id
0  one -1.236735   1
1  one -0.427070   2
2  two -2.330888   3
4  two  0.587660   5
6  one  0.860739   7
7  foo -2.041390   8

[6 rows x 3 columns]

You can also do this. This is NOT setting a copy and since it selects a Series (that is what df['B'] is, then it CAN be set directly

In [30]: df['B'].iloc[5] = 5

In [31]: df
Out[31]: 
     B    amount  id
0  one -1.236735   1
1  one -0.427070   2
2  two -2.330888   3
4  two  0.587660   5
6  one  0.860739   7
7    5 -2.041390   8

[6 rows x 3 columns]
Jeff
  • 125,376
  • 21
  • 220
  • 187