Merging pivot table with "long-data" dataformat

Question

UPDATE!

Just be aware of changing the year to int in both the df and the pivot table (after unstacking). That caused me some trouble :)

Data for the values:

d = {'ID':[1,1,1,2,2,2],'Date':['01-01-2013','01-02-2013','01-03-2013','01-
01-2008','01-02-2008','01-03-2008'],'CUSIP':
['X1','X1','X1','X2','X2','X2'],'X':['bla','bla','bla','bla','bla','bla']}
df = pd.DataFrame(data=d)

I have a dataframe:

   Identifier CUSIP    X       Date
0           1    X1  bla 2013-01-01
1           1    X1  bla 2013-01-02
2           1    X1  bla 2013-01-03
3           2    X2  bla 2008-01-01
4           2    X2  bla 2008-01-02
5           2    X2  bla 2008-01-03

And a pivot table:

       2008  2009  2010  2011  2012  2013
CUSIP                                    
X1        1     1     1     1     1     1
X2        2     2     2     2     2     2

And I would like to achieve a layout like:

   Identifier CUSIP    X       Date Values
0           1    X1  bla 2013-01-01 1
1           1    X1  bla 2013-01-02 1
2           1    X1  bla 2013-01-03 1
3           2    X2  bla 2008-01-01 2
4           2    X2  bla 2008-01-02 2
5           2    X2  bla 2008-01-03 2

It's not a difficult problem, but it would be easier to solve if you provided text instead of pictures, so it would be easy to copy — cs95, Feb 15 '18 at 08:42
Make some fake data and paste it as text/code. Take a look at this: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — cs95, Feb 15 '18 at 08:45
@Chris - Is possible add to question your pivot code? Maybe solution should be simplier. Thanks. — jezrael, Feb 15 '18 at 09:11
Chris you've managed to make your question even less readable. Also put the tables like @jezrael and I have below. — JSharm, Feb 15 '18 at 09:37

jezrael · Accepted Answer · 2018-02-15T12:13:59.960

You can use stack for reshape df2 with join with left join:

#if necessary
df['Date'] = pd.to_datetime(df['Date'])
df['year'] = df.Date.dt.year

df1 = df.join(df1.stack().rename('val'), on=['CUSIP', 'year'])
print (df1)
   Identifier       Date CUSIP    X  year  val
0           1 2013-01-01    X1  bla  2013    1
1           1 2013-01-02    X1  bla  2013    1
2           1 2013-04-03    X1  bla  2013    1
3           2 2008-01-01    X2  bla  2008    2
4           2 2008-01-02    X2  bla  2008    2
5           2 2008-03-03    X2  bla  2008    2

Alternative solution:

df1 = df.join(df1.stack().rename('val'), on=[df['CUSIP'], df['Date'].dt.year])
print (df1)
   Identifier       Date CUSIP    X  val
0           1 2013-01-01    X1  bla    1
1           1 2013-01-02    X1  bla    1
2           1 2013-04-03    X1  bla    1
3           2 2008-01-01    X2  bla    2
4           2 2008-01-02    X2  bla    2
5           2 2008-03-03    X2  bla    2

I believe you can use transform by year with function like size, mean, sum:

df['Date'] = pd.to_datetime(df['Date'])

df['Vals'] = df.groupby(['CUSIP', df['Date'].dt.year])['X'].transform('size')
print (df)
   Identifier       Date CUSIP    X  Vals
0           1 2013-01-01    X1  bla     5
1           1 2013-01-02    X1  bla     5
2           1 2013-04-03    X1  bla     5
3           1 2013-04-04    X1  bla     5
4           1 2013-05-05    X1  bla     5
5           2 2008-01-01    X2  bla     4
6           2 2008-01-02    X2  bla     4
7           2 2008-03-03    X2  bla     4
8           2 2008-03-04    X2  bla     4

Do I have to specify the column "year" in my pivot table after stacking? I don't see how it does the connection otherwise. — Chris, Feb 15 '18 at 13:29
Yes, but after I use stack on the pivot table I only have a "stack" with index = CUSIP. The columns have no names? So I should name the first column to year? — Chris, Feb 15 '18 at 13:37

score 2 · Answer 2 · answered Feb 15 '18 at 09:33

This is how I'd do it, it looks complicated but actually it's not much, I'm just explaining the steps.
Starting with a dataframe like this:

   Identifier CUSIP    X       Date
0           1    X1  bla 2013-01-01
1           1    X1  bla 2013-01-02
2           1    X1  bla 2013-01-03
3           2    X2  bla 2008-01-01
4           2    X2  bla 2008-01-02
5           2    X2  bla 2008-01-03

Add a year column with df['year'] = df.Date.dt.year

   Identifier CUSIP    X       Date  year
0           1    X1  bla 2013-01-01  2013
1           1    X1  bla 2013-01-02  2013
2           1    X1  bla 2013-01-03  2013
3           2    X2  bla 2008-01-01  2008
4           2    X2  bla 2008-01-02  2008
5           2    X2  bla 2008-01-03  2008

Then take your pivot table and stack it. (Understanding stack/unstack will greatly help you if you work with pivot tables)

       2008  2009  2010  2011  2012  2013
CUSIP                                    
X1        1     1     1     1     1     1
X2        2     2     2     2     2     2

>>> piv.stack()
CUSIP      
X1     2008    1
       2009    1
       2010    1
       2011    1
       2012    1
       2013    1
X2     2008    2
       2009    2
       2010    2
       2011    2
       2012    2
       2013    2

Then you need to reindex by CUSIP and year so that the values are in the same order as your dataframe.

>>> piv.stack().reindex(df[['CUSIP', 'year']])
CUSIP      
X1     2013    1
       2013    1
       2013    1
X2     2008    2
       2008    2
       2008    2
dtype: int64

All together:

>>> df['pivot_values'] = piv.stack().reindex(df[['CUSIP', 'year']]).values
>>> df
   Identifier CUSIP    X       Date  year  pivot_values
0           1    X1  bla 2013-01-01  2013             1
1           1    X1  bla 2013-01-02  2013             1
2           1    X1  bla 2013-01-03  2013             1
3           2    X2  bla 2008-01-01  2008             2
4           2    X2  bla 2008-01-02  2008             2
5           2    X2  bla 2008-01-03  2008             2

It looks nice, and I am trying to implement it. Currently I get the error that it cannot handle a non-unique multi-index. — Chris, Feb 15 '18 at 10:34
@Chris note the ``.values`` on the last line. The indices don't match so you only want to add the values as a column — JSharm, Feb 15 '18 at 11:17
Ye, I have the .values added, but I still get the error. I don't know if I have some duplicates or something? — Chris, Feb 15 '18 at 11:31
Still trying to figure it out, my dataframe returns NaNs in the last step even thouhg I can see them in the piv.stack(). — Chris, Feb 15 '18 at 11:51

piRSquared · Answer 3 · 2018-02-15T12:00:51.973

Assume my dataframe is df

df

  CUSIP        Date  ID    X
0    X1  01-01-2013   1  bla
1    X1  01-02-2013   1  bla
2    X1  01-03-2013   1  bla
3    X2  01-01-2008   2  bla
4    X2  01-02-2008   2  bla
5    X2  01-03-2008   2  bla

And pivot table is pv

pv

       2008  2009  2010  2011  2012  2013
CUSIP                                    
X1        1     1     1     1     1     1
X2        2     2     2     2     2     2

Solution

Use pd.DataFrame.lookup

Since your dates are just strings, I'll pass them through pd.to_datetime. I'll also ensure pvs columns are integers

df.assign(
    PV_Values=
    pv.rename(columns=int).lookup(
        df.CUSIP, pd.to_datetime(df.Date).dt.year
    )
)

  CUSIP        Date  ID    X  PV_Values
0    X1  01-01-2013   1  bla          1
1    X1  01-02-2013   1  bla          1
2    X1  01-03-2013   1  bla          1
3    X2  01-01-2008   2  bla          2
4    X2  01-02-2008   2  bla          2
5    X2  01-03-2008   2  bla          2

Note
If pv columns were already int and df.Date were already datetime, this would simply be:

df.assign(PV_Values=pv.lookup(df.CUSIP, df.Date.dt.year))

  CUSIP        Date  ID    X  PV_Values
0    X1  01-01-2013   1  bla          1
1    X1  01-02-2013   1  bla          1
2    X1  01-03-2013   1  bla          1
3    X2  01-01-2008   2  bla          2
4    X2  01-02-2008   2  bla          2
5    X2  01-03-2008   2  bla          2

Merging pivot table with "long-data" dataformat

3 Answers3