0

I don't know if this is a called a melt or unmelt, or pivot or unpivot, but I want to know if I can change my ColumnB to headers, starting in ColumnF (and data points under these headers), and likewise, change the headers, starting in ColumnF, to ColumnB.

Here is a screen shot of what I'm looking at.

enter image description here

Here is my data.

    qyear   Line_Item                              IDRSSD   schedule_code   RCON3543    RCON3545    RCON3547    RCON3548
97  2011Q1  TRDG ASTS ASTBCKD SCTY AUTO LN         1004368  RCD             7360        7360        7583        $7,583 
98  2012Q1  TRDG ASTS ASTBCKD SCTY OTHR CNSMR LN   1004368  RCD             17807       17807       18092       18092
99  2013Q1  TRDG ASTS ASTBCKD SCTY CMRL  IND LN    1004368  RCD             20073       20073       20971       20971
100 2011Q1  TRDG ASTS ASTBCKD SCTY OTHR LN         1004470  RCD             0       0       0       0
101 2012Q1  TRDG ASTS RTND BNFCL INT SCTY          1004470  RCD             0       0       0       0
102 2013Q1  TRDG ASTS EQT SCTY RDLY DET FV         1004470  RCD             0       0       0       0
103 2012Q1  TRDG ASTS EQT SCTY OTHR                1005075  RCD             0       0       0       0
104 2013Q1  TRDG ASTS LN PEND SCTY                 1005075  RCD             0       0       0       0
105 2011Q1  1ST ITEMIZED AMT FOR OTHR TRDG ASTS    1005440  RCD             0       0       0       0
106 2012Q1  2ND ITEMIZED AMT FOR OTHR TRDG ASTS    1005440  RCD             0       0       0       0
107 2013Q1  3RD ITEMIZED AMT FOR OTHR TRDG ASTS    1005440  RCD             0       0       0       0
108 2011Q1  1ST ITEMIZED AMT FOR OTHR TRDG ASTS    1005459  RCD             0       0       0       0
109 2012Q1  2ND ITEMIZED AMT FOR OTHR TRDG ASTS    1005459  RCD             0       0       0       0
110 2011Q1  3RD ITEMIZED AMT FOR OTHR TRDG ASTS    100553   RCD             0       0       0       0
111 2012Q1  SFP TPS ISS FNCL INSTITUTION           100553   RCD             0       0       0       0
112 2011Q1  SFP TPS ISS RE INV TR                  1005552  RCD             0       0       0       0
113 2012Q1  SFP CORP SIMILAR LN 1                  005552   RCD             0       0       0       0
114 2013Q1  SFP 1T4 FMLY RES MBS ISS US GSE        1005552  RCD             0       0       0       0
115 2011Q1  SFP 1T4 FMLY RES MBS NOT ISS GSE       100562   RCD             0       0       0       0
116 2012Q1  TRDG MBS PT SCTY ISS GUAR FFG          100562   RCD             0       0       0       0
117 2013Q1  TRDG OTHR MBS ISS GUAR FFG             100562   RCD             0       0       0       0
118 2011Q1  TRDG ASSET MBS OTHR RES MBS            1005655  RCD             0       0       0       0
119 2012Q1  TRDG ASSET STRD FNC PROD CASH          1005655  RCD             0       0       0       0
120 2013Q1  TRDG OTHR STRD FNC PROD SYN            1005655  RCD             0       0       0       0
121 2011Q1  TRDG OTHR STRD FNC PROD HYBRID         1005851  RCD             0       0       0       0
122 2012Q1  TRDG ASSET OTHR DEBT SCTY CNSL         1005851  RCD             0       0       0       0

I would like to have the final view look like this.

enter image description here I noticed that a transpose almost works, but that will flip ColumnC and ColumnD, and I don't want to touch those two columns. I want to keep ColumnC and ColumnD where they are, move ColumnB to headers, and move the headers to ColumnB. Is it possible to do this? Thanks a lot!

ASH
  • 20,759
  • 19
  • 87
  • 200

1 Answers1

1

I think you are looking at a combination of stack and unstack

(df.set_index(['qyear','Line_Item','IDRSSD','schedule_code'])
   .stack()
   .unstack('qyear')
   .reset_index()
)

Or unstack then transpose:

(df.set_index(['qyear','Line_Item','IDRSSD','schedule_code'])
   .unstack(level=['Line_Item','IDRSSD','schedule_code'])
   .T.reset_index()
)

Or melt then pivot:

(df.melt(id_vars=['qyear','Line_Item','IDRSSD','schedule_code'])
   .pivot_table(index=['Line_Item','IDRSSD','schedule_code', 'variable'], 
                columns='qyear', values='value', aggfunc='first')
   .reset_index()
)
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Whoa!! I think any/all of these options work for me!! Can you explain what's going one here? Just briefly, please. Thanks so much!!!! – ASH Jan 21 '21 at 05:12
  • @ASH basically it's what you said: making column `qyear` headers means `pivot`, making headers F-I a column means `melt`. These two operations are pretty much interchangeable. So it's just the matter of playing with a small piece of data. P/S: I got hand-on experience with this type of transform as well. – Quang Hoang Jan 21 '21 at 05:17
  • I was stuck on this for a while. I almost started to think it was impossible to do this kind of thing. Thanks again!! – ASH Jan 21 '21 at 05:34
  • Hey there, Quang Hoang. Thanks again for showing me this concept. This is saving me so, so, so much time, and it has helped me to find some more efficient ways of doing something that was previously taking waayyyy too long to do. My prior methodology was flawed; this way is sooooo much better. – ASH Jan 22 '21 at 04:59
  • I just encountered a weird problem. I'm using the 2nd code snippet, and it works in many scenarios, when I loop through different data sets, but in at least one scenario, and I suspect more, I get this error: InvalidIndexError: Reindexing only valid with uniquely valued Index objects – ASH Jan 22 '21 at 05:30
  • How can I modify the code to overcome that issue? – ASH Jan 22 '21 at 05:30
  • The last code sample with: aggfunc='first' ... seems to work in all my scenarios. Thanks again!!! – ASH Jan 23 '21 at 16:02
  • The error happens when you have duplicate data, e.g. two rows with the same pair of keys. `set_index` then `unstack` will fail. `pivot_table` with `agguncf='first'` will remove the duplicate row and only uses the first occurrence. You can also use `'mean'` for aggfunc when your data is numerical. – Quang Hoang Jan 23 '21 at 16:06
  • Yeap,yeap, that was my assumption. That's why the 'first' works in all my multiple scenarios. I wonder, though, would subsequent records have to be complete duplicates to be dropped? If I had Assets = 100 and Assets =100, all I need is one of those. If I had Assets = 100 and Assets =1000, I would consider these to be different, and I would like to capture both records. My data is 100% numerical. The headers are the only text records in the entire data set. – ASH Jan 23 '21 at 16:43
  • No, in the case 100 and 1000, this will keep the first occurrence and drop the other. If you want to retain both of them, you need to introduce some sort of enumeration. See [this quesiton](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe), Q/A 10 for details. – Quang Hoang Jan 23 '21 at 16:56
  • Sweeettt! I bookmarked that URL. Thanks again for everything!!! – ASH Jan 23 '21 at 17:03
  • 1
    Seems like aggfunc='sum' will do the trick. – ASH Jan 23 '21 at 18:22