I have a pd.DataFrame
import pandas as pd
country = ['US', 'US', 'US', 'UK', 'UK', 'UK']
year = ['1990', '1991', '2020', '1990', '1991', '2020']
people = [20, 34, 456, 5, 7, 300]
df = pd.DataFrame(zip(country, year, people), columns = ['country', 'year', 'people'])
country year people
0 US 1990 20
1 US 1991 34
2 US 2020 456
3 UK 1990 5
4 UK 1991 7
5 UK 2020 300
I wish to locate year '2020' and '1990'. I understand that this can be achieved with:
df.loc[(df.year == '2020') | (df.year == '1990')]
or
df.query('year == [\'2020\', \'1990\']')
to get the output:
country year people
0 US 1990 20
2 US 2020 456
3 UK 1990 5
5 UK 2020 300
However, I'd like to perform this 'query' with the in
operator.
Trying:
df.loc[df['year'] in ['2020', '1990']]
which raises the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I expect to use the in
operator in my pandas.DataFrame
subsetting, as it requires the least amount of typing.
What is the best way to mitigate this error raised by the in
operator?