I am new to Pandas.
My DataFrame
df
A
Best free
best free
free
free best
best
Nokia best for free
best nokia in 2020
best streaming platform for free
free streaming platform in 2020 for nokia phone
streaming for free Canada
...
My excel file = file
, file.Word
indicates that I am interested in the file and viewing sheet named 'Word'
file.ContainsBest
meilleur
beste
mejor
best
...
file.ContainsFree
gratuit
kostenlos
gratis
free
...
My desired DataFrame
df
A Contains Best Contains Free
Best free True True
best free True True
free False True
free best True True
best True False
Nokia best for free True True
best nokia in 2020 True False
best streaming platform for free True True
free streaming platform in 2020 for nokia phone False True
streaming for free Canada False False
... ... ...
Excel file
will work as an filter which I could modify by adding or subracting words from different sheets and it will act as a source determining if new DataFrame column will hold True
or False
values.
What I've tried so far:
file = pd.read_excel('config_values.xlsx')
print(file)
It prints out the first sheet values which are expected, once I add sheet_name='ContainsBest'
it throws this error:
xlrd.biffh.XLRDError: No sheet named <'ContainsBest'>
But it does open the selected sheets with this code:
file = pd.read_excel(open('config_values.xlsx', 'rb'),
sheet_name='ContainsFree')
print(file)
Does that mean that pd.read_excel(open...
cant be used to open the first sheet with a name?
So my question can be split into two:
- Correct way of using excel file with sheets to read from different sheets in it
- What would be the best approach to add columns and check if values in sheets exist in them? Using
.isin()
or there are better options?
EDIT I can use:
df['Contains Free'] = df.Search_Query.str.contains('free', regex=True)
Which works fine, but if I pass file
like so:
file = pd.read_excel(open('config_values.xlsx', 'rb'),
sheet_name='ContainsFree')
df['Contains Free'] = df.Search_Query.str.contains(file, regex=True)
I get this error:
TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed
Thank you for your suggestions.