1

I have a table with time series data:

 A    B     C
_____________
ID1 1978    1
ID1 1979    2
ID1 1980    5
ID1 1947    6
ID2 1950    8
ID2 1952    2
ID2 1955    3
ID2 1958    5
ID2 1963    4
ID2 1969    3
ID3 1970    9
ID3 1976    8
ID3 2002    7
ID3 1972    4
ID3 1973    6

Which I would like to transform in such a way:

        1947    1950    1952    1955    1958    1963    1969    1970    1972    1973    1976    1978    1979    1980    2002
ID1     6                                                                                       1       2       5
ID2             8       2       3       5       4       3
ID3                                                             9       4       6         8                             7

How can I transpose the data and fill missing year values with empty cells?

Emi OB
  • 2,814
  • 3
  • 13
  • 29
honeymoon
  • 2,400
  • 5
  • 34
  • 43

2 Answers2

3

You can use pandas pivot()

df.pivot(index='A', columns='B', values='C')

Output:

B    1947  1950  1952  1955  1958  1963  ...  1973  1976  1978  1979  1980  2002
A                                        ...                                    
ID1   6.0   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   1.0   2.0   5.0   NaN
ID2   NaN   8.0   2.0   3.0   5.0   4.0  ...   NaN   NaN   NaN   NaN   NaN   NaN
ID3   NaN   NaN   NaN   NaN   NaN   NaN  ...   6.0   8.0   NaN   NaN   NaN   7.0
Emi OB
  • 2,814
  • 3
  • 13
  • 29
  • This does not answer the question, it wants to fill the missing years with empty strings not `NaN`... – T C Molenaar Sep 26 '22 at 09:32
  • 1
    @TCMolenaar No, that is not what the OP has written. They have written "fill missing year values with **empty cells**" which can be interpreted as they just wanted gaps/Nans (i.e. empty cells) in those gaps. If this df was exported to excel, those NaNs would be "empty cells" as specified by OP. – Emi OB Sep 26 '22 at 09:35
  • However, if OP did mean to say "fill missing year values with **empty strings**" then they can add `.fillna('')` to the end of the pivot statement, or use your solution. – Emi OB Sep 26 '22 at 09:37
  • Ah I guess you are right about the textual wish, but the desired output does not contain `NaN's`. – T C Molenaar Sep 26 '22 at 09:43
  • @TCMolenaar Also filling blanks with strings would turn the columns into objects rather then int/float, which will not allow any numerical operations (summing/averaging etc) without converting those columns back to numerical (as they are in my solution), so would be less useful – Emi OB Sep 26 '22 at 09:51
-1

You can use pivot_table() from the pandas module to create the output as desired doing:

df.pivot_table(index='A', columns='B', values='C', fill_value='')

Output:

B   1947.0 1950.0 1952.0 1955.0 1958.0  ... 1976.0 1978.0 1979.0 1980.0 2002.0
A                                       ...                                   
ID1    6.0                              ...           1.0    2.0    5.0       
ID2           8.0    2.0    3.0    5.0  ...                                   
ID3                                     ...    8.0                         7.0

Change '' into 0 to get zeroes instead of empty strings.

T C Molenaar
  • 3,205
  • 1
  • 10
  • 26