0

I have a multi indexed pandas table as below.
enter image description here

I want to update Crop and Avl column, say with 'Tomato', and '0', but only for finite no of times (say, I need only 10 rows for Tomato, satisfying a condition). Currently via pandas I end up updating all rows that satisfy that condition.

col1 = ildf1.index.get_level_values(1) # https://stackoverflow.com/a/50608928/9148067
cond = col1.str.contains('DN_Mega') & (ildf1['Avl'] == 1)

ildf1.iloc[ cond , [0,2]] = ['Tomato', 0]

How do I restrict it to only say 10 rows of all rows that satisfy the condition?

PS: I used get_level_values as I have 4 columns (GR, PP+MT, Bay, Row) multi indexed in my df.

Parthiban Rajendran
  • 430
  • 1
  • 7
  • 18

1 Answers1

1

For df defined as below, you need to add additional index to numerate all rows with different number, then you can set new values based on slice. Here you go =^..^=

import pandas as pd


df = pd.DataFrame({'Crop': ['', '', '', '', ''], 'IPR': ['', '', '', '', ''], 'Avi': [1, 2, 3, 4, 5]}, index=[['0','0', '8', '8', '8'], ['6', '7', '7', '7', '7']])

# add additional index
df['id'] = [x for x in range(0, df.shape[0])]
df.set_index('id', append=True, inplace=True)

# select only two values based on condition
condition = df.index[df.index.get_level_values(0).str.contains('8')][0:2]
df.loc[condition, ['Crop', 'IPR']] = ['Tomato', 0]

Output:

          Crop IPR  Avi
    id                 
0 6 0                 1
  7 1                 2
8 7 2   Tomato   0    3
    3   Tomato   0    4
    4                 5
Zaraki Kenpachi
  • 5,510
  • 2
  • 15
  • 38
  • This should work, testing it, Btw, is it a limitation of multi index that we cannot do this? I thought multi index simply when disassembled, would be repetitive rows of specified columns shown as merged, but in reality still separate. – Parthiban Rajendran Sep 21 '20 at 09:59
  • @Parthiban Rajendran in pandas you con't define number of lines to fill data on multindex where index is the same, that's why you need to add additional index to handle this. Another solution will by disable multiindex and working with columns data. – Zaraki Kenpachi Sep 21 '20 at 10:04
  • And which would be faster? – Parthiban Rajendran Sep 21 '20 at 10:18
  • @Parthiban Rajendran I think the above – Zaraki Kenpachi Sep 21 '20 at 10:21
  • In first method, after doing the job, I tried removing that id column, but this loses all multi index cols as well. I did `df.reset_index(drop=True, inplace=True)`. For second, I disabled MI using `df.reset_index(level=['GR','PP+MT','Bay'])` but now how to bring back MI after the operations? In both cases if u could show how to get back to original df after ops if possible, it would be great. – Parthiban Rajendran Sep 21 '20 at 10:27
  • @Parthiban Rajendran simply at the end of operations remove additional index `id`. – Zaraki Kenpachi Sep 21 '20 at 10:31
  • Doing that in your suggested method (adding ID), removes all the indices altogether. It has become like [this](https://i.postimg.cc/85cVL3hw/image.png). For better clarity on what I tried, I will update that in question as well. – Parthiban Rajendran Sep 21 '20 at 10:40
  • After more tinkering, this `ildf1.reset_index(drop=True, level=['id'], inplace=True)` seems to solve removing that id alone in your method. Your suggestions were greatly helpful and on time. Thank you. – Parthiban Rajendran Sep 21 '20 at 10:46