0

Hi all so I have 3 dataframes:

df1
   LC_REF   Category    vals
0 DT 17 1C     WM       dog
1 DT 17 1C     WH       foo, bat
2 DT 17 1C     WP       red, steam

df2
   LC_REF   Category    vals
0 DT 17 1C     WM       cat
1 DT 17 1C     WH       sea, bat

df3
   LC_REF   Category    vals
0 DT 17 1C     WM       turn

I was wondering if there was any way to fill in all dataframes which do NOT have WM, WH, WP, in the 'Category' column and insert the missing categories as such:

df1
   LC_REF   Category    vals
0 DT 17 1C     WM       dog
1 DT 17 1C     WH       foo, bat
2 DT 17 1C     WP       red, steam

df2
   LC_REF   Category    vals
0 DT 17 1C     WM       cat
1 DT 17 1C     WH       sea, bat
2 DT 17 1C     WP       NaN

df3
   LC_REF   Category    vals
0 DT 17 1C     WM       turn
1 DT 17 1C     WH       NaN
2 DT 17 1C     WP       NaN

My attempt:

if df.loc[:, df.Category.isin(['WM', 'WH','WP']).count() == 3 :
    continue
else:
    ???

I know I need to involve boolean masking but I am not quite sure on how to best execute that.

codeninja
  • 377
  • 2
  • 4
  • 10

1 Answers1

1
df2.index=df2.Category
df2=df2.reindex(['WM','WH','WP'])
df2['LC_REF']=df2[['LC_REF']].ffill()
df2.Category=df2.index

              LC_REF Category      vals
Category                               
WM        0 DT 17 1C       WM       cat
WH        1 DT 17 1C       WH  sea, bat
WP        1 DT 17 1C       WP       NaN

Here is another solution by using pd.concat, stack, unstack

DF=pd.concat([df1,df2],axis=0,keys=['df1','df2']).reset_index()
DF=DF.groupby(["level_0","Category"]).agg({'LC_REF':'sum','vals':'sum'}).unstack('Category').stack('Category', dropna=False)
DF['LC_REF'].ffill(inplace=True)


DF
Out[696]: 
                      LC_REF        vals
level_0 Category                        
df1     WH        1 DT 17 1C    foo, bat
        WM        0 DT 17 1C         dog
        WP        2 DT 17 1C  red, steam
df2     WH        1 DT 17 1C    sea, bat
        WM        0 DT 17 1C         cat
        WP        0 DT 17 1C        None

PS:slice the df1 by using DF.loc['df1']

The different fromNaN and None you can find here

BENY
  • 317,841
  • 20
  • 164
  • 234