How to select rows based on being a subset of values in a list

Question

Data

I have the following data:

data = {'state': ['Alabama', 'Alabama', 'Alabama', 'Alabama', 'Alabama', 'Wisconsin', 'Wisconsin'],
        'year': [1989, 1989, 1989, 1989, 1990, 2016, 1970],
        'quarter': [1, 2, 3, 4, 1, 4, 4],
        'v': [3.984353, 4.427839, 4.173073, 3.485882, 3.865541, 0.168776, 0.168776]}
df = pd.DataFrame(data)

       state  year  quarter         v
0    Alabama  1989        1  3.984353
1    Alabama  1989        2  4.427839
2    Alabama  1989        3  4.173073
3    Alabama  1989        4  3.485882
4    Alabama  1990        1  3.865541
5  Wisconsin  2016        4  0.168776
6  Wisconsin  1970        4  0.168776

The data includes observed values for states and dates back to 1970. For some states it starts later than 1970.

Goal

I want to keep the states for which I can observe the data for both 1970 and 2016.

Code

The code below includes but doesn't subset the data at all:

df.loc[(df['year'] >= 1970) & (df['year'] <= 2016)]

How can I do that in python?

What does your _filter_ using `df.loc()` print ? What is wrong with the result? — hc_dev, Jun 30 '22 at 18:43
That result is wrong because it's not the question he wanted to ask. It just picks all rows between 1970 and 2016, which is essentially everything. You, unfortunately, need a two-step process. You need to construct the list of states for which there is a year 1970 entry and a year 2016 entry, then you need to select the subset of all rows for those states. — Tim Roberts, Jun 30 '22 at 18:45
when I store it in a new data set, it gives exactly the same data. I wanna delete the states for which data doesn't start from 1970 but at a later that, say 1989. @hc_dev — kiyis_stats, Jun 30 '22 at 18:48

score 2 · Answer 1 · answered Jun 30 '22 at 19:02

2

If I follow the explanation of @TimRoberts, I think your are looking for:

issubset = lambda x: set([1970, 2016]).issubset(x)
out = df[df.groupby('state')['year'].transform(issubset)]

answered Jun 30 '22 at 19:02

Corralien

109,409
8
28
52

1

Here's explanation for `set([1970, 2016]).issubset(x)`: https://stackoverflow.com/a/6159329/7758804 – Trenton McKinney Jun 30 '22 at 19:04

How to select rows based on being a subset of values in a list

Data

Goal

Code

1 Answers1