2

I am running a conditional loop to create a new column in my DataFrame (TDF) based on the value of column "alone". If the value is 0, add the string "alone" else add "with family" in the column "alone". I am using the code: I am getting the error:

tdf['alone'].loc[['alone'] >0]= 'with family'
tdf['alone'].loc[['alone'] ==0] = 'alone'

After running the above line, I am getting the following error:

KeyError: 'cannot use a single bool to index into setitem'

I referred this same question, and what I gathered is that I need to have row_indexer in tdf['alone'].loc[[row_indexer,['alone']] = 'alone' but I am not sure how to get the values in row_indexer

Sajal
  • 89
  • 1
  • 14

3 Answers3

3

pandas.Series.clip

clip values to be only 0 and 1 and use it to slice an array

tdf.assign(alone=np.array(['alone', 'with family'])[tdf.alone.clip(0, 1)])

         alone  col
0  with family    1
1  with family    1
2  with family    9
3        alone    4
4  with family    2
5        alone    3

pandas.Series.map

tdf.assign(alone=tdf.alone.map(lambda x: 'with family' if x else 'alone'))

         alone  col
0  with family    1
1  with family    1
2  with family    9
3        alone    4
4  with family    2
5        alone    3

map

Version 2

tdf.assign(alone=tdf.alone.map(lambda x: {0: 'alone'}.get(x, 'with family')))

         alone  col
0  with family    1
1  with family    1
2  with family    9
3        alone    4
4  with family    2
5        alone    3

Setup

Borrowed from @jezrael

tdf = pd.DataFrame({'alone':[4,4,5,0,5,0],
                   'col':[1,1,9,4,2,3]})
piRSquared
  • 285,575
  • 57
  • 475
  • 624
2

Need boolean indexing with loc and boolean masks - compare column of DataFrame with value 0, not one item list [alone]:

tdf.loc[tdf['alone'] > 0, 'alone'] = 'with family'
tdf.loc[tdf['alone'] ==0, 'alone'] = 'alone'

If not negative numbers is possible use numpy.where:

tdf['alone'] = np.where(tdf['alone'] == 0,  'alone', 'with family')

Sample:

tdf = pd.DataFrame({'alone':[4,4,5,0,5,0],
                   'col':[1,1,9,4,2,3]})

print (tdf)
   alone  col
0      4    1
1      4    1
2      5    9
3      0    4
4      5    2
5      0    3

tdf['alone'] = np.where(tdf['alone'] == 0,  'alone', 'with family')
print (tdf)

         alone  col
0  with family    1
1  with family    1
2  with family    9
3        alone    4
4  with family    2
5        alone    3

Also solution is wrong, because chained assignments - it could create a copy which updates a copy of tdf['alone'] which you would not see:

#added boolean mask tdf['alone'] > 0
tdf['alone'].loc[tdf['alone'] > 0 ]= 'with family'
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Hey, Thanks for the fast reply and your suggested method work flawlessly, but i am curious how can i do it my way? Do you have any say or suggestions on that? – Sajal Sep 06 '18 at 05:20
  • @Sajal - Then use `tdf.loc[tdf['alone'] > 0, 'alone'] = 'with family'` and `tdf.loc[tdf['alone'] ==0, 'alone'] = 'alone'` – jezrael Sep 06 '18 at 05:22
0

[['alone'] > 0] compares the Python list ['alone'] to the integer 0. Use the following instead:

tdf.loc[tdf['alone'] > 0, 'alone'] = 'with family'
tdf.loc[tdf['alone'] == 0, 'alone'] = 'alone'
Andrey Portnoy
  • 1,430
  • 15
  • 24
  • Please dont change your solution by me :( – jezrael Sep 06 '18 at 05:05
  • Hi, thanks for the quick reply, when i ran your code, it replaces the whole row of the data frame tdf(890X14) with the "alone/ with family" according to the column "alone", whereas i only want to change the values of the column "alone". – Sajal Sep 06 '18 at 05:16
  • @jezrael I did not change my solution "by you". Please don't edit my answer in contradiction with my intent. I made an error and corrected it, you just happened to also answer correctly. – Andrey Portnoy Sep 06 '18 at 05:20
  • @AndreyPortnoy - So sorry, it is impossible verify. Maybe I was wrong :( – jezrael Sep 06 '18 at 05:23