1

I am new to Pandas.

My DataFrame

df

A
Best free
best free
free
free best
best
Nokia best for free
best nokia in 2020
best streaming platform for free
free streaming platform in 2020 for nokia phone
streaming for free Canada
...

My excel file = file, file.Word indicates that I am interested in the file and viewing sheet named 'Word'

file.ContainsBest

meilleur
beste
mejor
best
...

file.ContainsFree

gratuit
kostenlos
gratis
free
...

My desired DataFrame

df

A                                                   Contains Best   Contains Free
Best free                                           True            True
best free                                           True            True
free                                                False           True
free best                                           True            True
best                                                True            False
Nokia best for free                                 True            True
best nokia in 2020                                  True            False
best streaming platform for free                    True            True
free streaming platform in 2020 for nokia phone     False           True
streaming for free Canada                           False           False
...                                                 ...             ...

Excel file will work as an filter which I could modify by adding or subracting words from different sheets and it will act as a source determining if new DataFrame column will hold True or False values.

What I've tried so far:

file = pd.read_excel('config_values.xlsx')
print(file)

It prints out the first sheet values which are expected, once I add sheet_name='ContainsBest' it throws this error:

xlrd.biffh.XLRDError: No sheet named <'ContainsBest'>

But it does open the selected sheets with this code:

file = pd.read_excel(open('config_values.xlsx', 'rb'),
                     sheet_name='ContainsFree')
print(file)

Does that mean that pd.read_excel(open... cant be used to open the first sheet with a name?

So my question can be split into two:

  1. Correct way of using excel file with sheets to read from different sheets in it
  2. What would be the best approach to add columns and check if values in sheets exist in them? Using .isin() or there are better options?

EDIT I can use:

df['Contains Free'] = df.Search_Query.str.contains('free', regex=True)

Which works fine, but if I pass file like so:

file = pd.read_excel(open('config_values.xlsx', 'rb'),
                     sheet_name='ContainsFree')
df['Contains Free'] = df.Search_Query.str.contains(file, regex=True)

I get this error:

TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

Thank you for your suggestions.

Jonas Palačionis
  • 4,591
  • 4
  • 22
  • 55
  • Does this previous answer help: https://stackoverflow.com/questions/29700552/series-objects-are-mutable-and-cannot-be-hashed-error ? – Plato77 Jan 03 '20 at 17:18
  • Sorry it does not, i am new to python. I mean I am not trying to change the dataframe, so why would it throw an error that its not hashable. – Jonas Palačionis Jan 05 '20 at 22:08

1 Answers1

0

Ended up using this solution:

containsFree = pd.read_excel('config_values.xlsx',
                             sheet_name='ContainsFree', header=None)
containsFree = '|'.join(containsFree[0])
df['ContainsFree'] = df.SearchQuery.str.contains(
    containsFree, regex=True, flags=re.IGNORECASE)

I've used the same code per each excel sheet. However, for the first sheet you do not need to pass a sheet name, even if its the right one, just pass the file without the name:

containsbest = pd.read_excel('config_values.xlsx', header=None)
containsbest = '|'.join(containsbest[0])
df['Containsbest'] = df.SearchQuery.str.contains(
    containsbest, regex=True, flags=re.IGNORECASE)
Jonas Palačionis
  • 4,591
  • 4
  • 22
  • 55