0

I am trying to compare the values from a list with the values of a dataframe column. In case both are equal, I want to save the whole row of the dataframe. I am not being able of writing the instruction to save the whole row of the dataframe.

Here are some data of the dataframe and list:

print(approval_polls.head(5))

 start_date    end_date         pollster    sponsor  sample_size population  \
0  2020-02-02  2020-02-04           YouGov  Economist       1500.0          a   
1  2020-02-02  2020-02-04           YouGov  Economist        376.0          a   
2  2020-02-02  2020-02-04           YouGov  Economist        523.0          a   
3  2020-02-02  2020-02-04           YouGov  Economist        599.0          a   
4  2020-02-07  2020-02-09  Morning Consult        NaN       2200.0          a   


excel_doc = ['Monmouth University' 'Selzer & Co.' 'ABC News/The Washington Post'
 'Siena College/The New York Times Upshot' 'YouGov']

The code I started writing is as follows:

approval_polls = approval_polls[approval_polls['pollster'].isin(excel_doc)]

The result I'm getting isn't right.

print (approval_polls)

[start_date, end_date, pollster, sponsor, sample_size, population, ...]

What is wrong in here?

Thank you for your suggestions

icatalan
  • 101
  • 2
  • 10
  • Could you show a part in the Dataframe where the pollster value is in the excel_doc ? Because for now we don't see one – azro May 27 '21 at 18:43
  • if you get them both as lists you could use `==`,but the order will have to be the same – doctorlove May 27 '21 at 18:45
  • 1
    it is unclear from your questions but you can probably do `approval_polls[approval_polls['pollster'].isin(excel_doc)]` or you can [use iterrows](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html) `for i, row in approval_polls.iterrows()` – Matt May 27 '21 at 18:47
  • @azro, I edited the list so that at least one of the listed values is also in the df, sorry – icatalan May 27 '21 at 18:49
  • @doctorlove, when looping I'm comparing them as if also de df was a list, but once the value is compared and equal, then I need to save the whole line of the df, so I can't work as if only were 2 lists – icatalan May 27 '21 at 18:51
  • Does this answer your question? [Use a list of values to select rows from a pandas dataframe](https://stackoverflow.com/questions/12096252/use-a-list-of-values-to-select-rows-from-a-pandas-dataframe) – Matt May 27 '21 at 18:52

1 Answers1

0

Yes, isin is the way to go

excel_doc = ['Monmouth University' 'Selzer & Co.' 'ABC News/The Washington Post'
             'Siena College/The New York Times Upshot', 'YouGov']

df = df[df['pollster'].isin(excel_doc)]
azro
  • 53,056
  • 7
  • 34
  • 70
  • I tried and it seemed to work but not, as the function only takes the headers but not the content. As a result of that, df is a new list instead of a df without the lines that don't fulfill the requirements – icatalan May 28 '21 at 14:10
  • 1
    @icatalan that is very strange . If `approval_polls` is a DF, the filtering returns a DF, for sure, the problem doesn't come from here – azro May 29 '21 at 08:23