1

DATA::

Unnamed: 0  gvkey  date  CUSIP  conm  tic  cik  PERMNO  COMNAM

0
0 1001 1983 00016510 A & M FOOD SERVICES INC AMFD. 723576.0 10015 NaN 

1
1 1001 1983 00016510 A & M FOOD SERVICES INC AMFD. 723576.0 10015 A & M FOOD SERVICES INC 

2
5 1001 1984 00016510 A & M FOOD SERVICES INC AMFD. 723576.0 10015 A & M FOOD SERVICES INC 

3
17 1001 1985 00016510 A & M FOOD SERVICES INC AMFD. 723576.0 10015 A & M FOOD SERVICES INC 

4
29 1003 1983 00035410 A.A. IMPORTING CO INC ANTQ 730052.0 10031 NaN 

Goal::

Take specific observation's PERMNO for the year(data)

Conditions::

  1. the observation has gvkey data
  2. the next year's observation for that gvkey has 'COMNAM' variable's data

for example; year = 1983, gvkey = 1001, next_year = 1984

What I've tried::

df = DATA
df[(df['date'] == year) & (df['date'] == gvkey) & (df[df['date'] == next_year]['COMNAM'].isna() != 1])]

however, it returns, NO OBSERVATIONS..

I think this is because the code contains two mutually exclusive conditions:: df['date'] == year and df['date'] == next_year

Can anyone give me advice? Thanks!

ChanKim
  • 361
  • 2
  • 16

1 Answers1

0

Ensure you use consistent mask indices

pd.Series.isna returns a series, not a Boolean value. Importantly, since you apply a filter first via df[df['date'] == next_year] the Boolean series will not have the same indices as your first two masks.

Don't use chained indexing

You should avoid chained indexing, which is explicitly discouraged in the docs. Instead, you can find in-scope years and then use pd.Series.isin. Finally, for readability I suggest you combine multiple masks:

m1 = df['date'].eq(year)
m2 = df['gvkey'].eq(gvkey)

viable_years = df.loc[m2 & df['COMNAM'].notnull(), 'date'].values  # returns in-scope years

m3 = (df['date'] + 1).isin(viable_years)  # check next year is a good year

res = df[m1 & m2 & m3]
jpp
  • 159,742
  • 34
  • 281
  • 339
  • In this code, aren't viable_years the same for all the gvkeys? – ChanKim Sep 30 '18 at 14:43
  • @ChanKim, Maybe you need `df['gvkey'].eq(gvkey)` instead of `df['date'].eq(gvkey)` (which you have in your code right now). I've also updated to include `m2` in the calculation of `viable_years`. – jpp Sep 30 '18 at 14:46
  • Excuse me, If I'm doing exactly the same thing, and the only difference is that now the date is written in month format(yyyymm). In this case is there any code that can replace (df['date'] + 1).isin(viable_years) ? (Since, month ends in 12 and 12 +1 = 13 which is not a month) – ChanKim Sep 30 '18 at 15:29
  • @ChanKim, You should [convert to datetime](https://stackoverflow.com/questions/26763344/convert-pandas-column-to-datetime) first. Then use [`pd.Series.dt.year`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.year.html). If you're still stuck feel free to ask a new question. – jpp Sep 30 '18 at 15:31