0

I am trying to apply a function to a column of a dataframe and it keeps throwing an error. I need your help.
The function is suppose to delete rows that do not contain none of the items in the array keywordz.

function »

def get_restuarant_business(data):
    keywordz=['food','restuarant','bakery','deli','fast', 
                  'food','bars','coffee']

    data=data.lower()
    while((data != '' or pd.isnull(data)==False ) and isinstance(data, 
    str)):  
       flag= False
       for i in keywordz:
          if i in data:
             flag=True
             break
          else:
             continue
    return flag

rest_biz = business.copy().loc[business['categories'].head(1).apply(
                                     get_restuarant_business) == True]

This is the exception that is being thrown.

----------------------------------------------------------------------- 
----
TypeError                                 Traceback (most recent call 
last)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in 
pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-13-8da5e44c6072> in <module>()
1 print(business.head(5))
----> 2 business['categories'].apply(get_restuarant_business)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py
 in __getitem__(self, key)
764         key = com._apply_if_callable(key, self)
765         try:
766             result = self.index.get_value(self, key)
767 
768             if not is_scalar(result):

~\AppData\Local\Continuum\anaconda3\lib\site- 
packages\pandas\core\indexes\base.py in get_value(self, series, key)
3101         try:
3102             return self._engine.get_value(s, k,
3103                                           tz=getattr(series.dtype, 'tz', None))
3104         except KeyError as e1:
3105             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

 pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: 'categories'
0    'tours, breweries, pizza, restaurants, food, h...
1    'chicken wings, burgers, caterers, street vend...
2    'breakfast & brunch, restaurants, french, sand...
3    'home & garden, nurseries & gardening, shoppin...
4                                 'coffee & tea, food'
 Name: categories, dtype: object

Can you please help me?

hygull
  • 8,464
  • 2
  • 43
  • 52
Mz_ymg21
  • 15
  • 3

2 Answers2

0

I think below function will solve ur purpose

def get_restuarant_business(data):
    keywordz=['food','restuarant','bakery','deli','fast food','bars','coffee']

    data=data.lower()
    flag= False
    if data in keywordz:
        flag= True

    return flag

call this

business_df['food_cat'] = business_df['categories'].apply(
    get_restuarant_business)

filter where u have true

Venkatachalam
  • 16,288
  • 9
  • 49
  • 77
0

Try this!

import numpy as np
business = pd.DataFrame({'categories':['tours, breweries, pizza, restaurants, food',
                                        'chicken wings, burgers, caterers, street vend',
                                       'breakfast & brunch, restaurants, french, sand',
                                       'home & garden, nurseries & gardening, shopping']})

keywordz=['food','restaurants','bakery','deli','fast','food','bars','coffee']

rest_biz = business[business['categories'].apply(lambda x: np.any([True if w.lower() in keywordz else False for w in x.split(', ')]))]

# output
    categories
0   tours, breweries, pizza, restaurants, food
Venkatachalam
  • 16,288
  • 9
  • 49
  • 77