KeyError: False in pandas dataframe

Question

import pandas as pd

businesses = pd.read_json(businesses_filepath, lines=True, encoding='utf_8')
restaurantes = businesses['Restaurants' in businesses['categories']]

I would like to remove the lines that do not have Restaurants in the categories column, and this column has lists, however gave the error 'KeyError: False' and I would like to understand why and how to solve.

`'Restaurants' in businesses['categories']` is a native python expression that evaluates to the scalar `False` (or `True`). Is `'Restaurants'` not simply one of the categories? In that case you need `businesses[businesses.categories == 'Restaurants']`. — Andras Deak -- Слава Україні, Jul 02 '17 at 21:33

score 27 · Accepted Answer · answered Jul 02 '17 at 21:35

27

The expression 'Restaurants' in businesses['categories'] returns the boolean value False. This is passed to the brackets indexing operator for the DataFrame businesses which does not contain a column called False and thus raises a KeyError.

What you are looking to do is something called boolean indexing which works like this.

businesses[businesses['categories'] == 'Restaurants']

answered Jul 02 '17 at 21:35

Ted Petrou

59,042
19
131
136

I'm experiencing the same issue in my code: ```df.loc[('Error:' in df['Quick Insights'])``` Problem is that I need the ```in``` part, as the word "Error:" is just a substring of ```quick insights```. How would you address that? Is there any operator that acts like SQL's ```like```? – goidelg Aug 19 '21 at 06:49
2

solved using ```df[df['Quick Insights'].str.match('(Error:)', na=False)]``` – goidelg Aug 19 '21 at 06:56

score 6 · Answer 2 · answered Apr 26 '21 at 14:47

6

The reason for this is that the Series class implements a custom in operator that doesn't return an iterable like the == does, here's a workaround

businesses[['Restaurants' in c for c in list(businesses['categories'])]]

hopefully this helps someone where you're looking for a substring in the column and not a full match.

answered Apr 26 '21 at 14:47

iggy12345

1,233
12
31

I'm considering just regex matching what I'm looking for, turning it into a new column and indexing using the new column – dang Aug 08 '21 at 23:17

score 5 · Answer 3 · answered Feb 06 '18 at 11:18

If you find that your data contains spelling variations or alternative restaurant related terms, the following may be of benefit. Essentially you put your restaurant related terms in restuarant_lst. The lambda function returns true if any of the items in restaurant_lst are contained within each row of the business series. The .loc indexer filters out rows which return false for the lambda function.

restaurant_lst = ['Restaurant','restaurantes','diner','bistro']
restaurant = businesses.loc[businesses.apply(lambda x: any(restaurant_str in x for restaurant_str in restaurant_lst))]

score 1 · Answer 4 · answered Jul 02 '17 at 21:35

1

I think what you meant was :

businesses = businesses.loc[businesses['categories'] == 'Restaurants']

that will only keep rows with the category restaurants

answered Jul 02 '17 at 21:35

Rayhane Mama

2,374
11
20

score 0 · Answer 5 · answered Sep 10 '20 at 09:24

None of the answers here actually worked for me,

businesses[businesses['categories'] == 'Restaurants']

obviously won't work since the value in 'categories' is not a string, it's a list, meaning the comparison will always fail.

What does, however, work, is converting the column into tuples instead of strings:

businesses['categories'] = businesses['categories'].apply(tuple)

That allows you to use the standard .loc thing:

businesses.loc[businesses['categories'] == ('Restaurants',)]

KeyError: False in pandas dataframe

5 Answers5

Linked