How to rearrange a dataframe so one column is extracted and used as column headings

Question

Existing data looks like the following:

    Year Month Data
0  2021    11   45
1  2021    12   77
2  2022     1  234
3  2022     2   12
4  2022     3  432
5  2022     4  888

I want to rearrange this table to look like this:

    Year    1   2    3    4 5 6 7 8 9 10  11  12
0  2021                                  45  77
1  2022  234  12  432  888                     
2  2023

I have found a way to accomplish this by using nested loops but hoping there is a better way. I have quite a few tables to work with and using my method is going to be awkward at best. More importantly, it requires a blank dataframe for the data to be inserted into. This is a static framework and would require regular updates depending on the format of the original dataframe.

Here is the code:

import pandas as pd

# had to create this blank dataframe to put the data into
data = {'Year': ['2021', '2022', '2023'],
        '1': ['', '', ''],
        '2': ['', '', ''],
        '3': ['', '', ''],
        '4': ['', '', ''],
        '5': ['', '', ''],
        '6': ['', '', ''],
        '7': ['', '', ''],
        '8': ['', '', ''],
        '9': ['', '', ''],
       '10': ['', '', ''],
       '11': ['', '', ''],
       '12': ['', '', '']
     }

new_table = pd.DataFrame(data)

# created this dataframe to represent an example of how the original dataframes are received
data2 = {'Year': ['2021', '2021', '2022', '2022', '2022', '2022'],
        'Month': ['11', '12','1','2','3','4'],
        'Data': ['45','77','234','12','432','888']}

orig_table = pd.DataFrame(data2)

for k in range(len(new_table)):
        for i in range(12):
                for x in range(len(orig_table)):
                        if orig_table.iloc[x,0] == new_table.iloc[k,0]  and orig_table.iloc[x,1] == new_table.columns[i+1]:
                            new_table.iloc[k,i+1] = orig_table.iloc[x,2]

print('\n',orig_table)     
print('\n Updated Test df\n',new_table)

Any help would be greatly appreciated!

you need to perform pivot (long to wide), check here: https://stackoverflow.com/questions/22798934/pandas-long-to-wide-reshape-by-two-variables — Ehsan Hamzei, Jul 05 '23 at 02:05

score 0 · Accepted Answer · answered Jul 05 '23 at 02:18

import pandas as pd
import numpy as np

# create sample data
df = pd.DataFrame({"Year":[2021,2021,2022,2022,2022,2022], "Month":[11,12,1,2,3,4], "Data":[45,77,234,12,432,888]})

# pivot data: This will work in case you have all the month 1-12 somewhere in your data
df_pivot = df.pivot(index="Year", columns="Month", values="Data").reset_index("Year")

#Month  Year      1     2      3      4    11    12
#0      2021    NaN   NaN    NaN    NaN  45.0  77.0
#1      2022  234.0  12.0  432.0  888.0   NaN   NaN

#if you don't have all month in your dataset and you want create empty columns if not exist
for i in range(1,13):
    df_pivot[i] = df_pivot.get(i, np.NaN)
#Month  Year      1     2      3      4    11    12   5   6   7   8   9  10
#0      2021    NaN   NaN    NaN    NaN  45.0  77.0 NaN NaN NaN NaN NaN NaN
#1      2022  234.0  12.0  432.0  888.0   NaN   NaN NaN NaN NaN NaN NaN NaN

# if you want to order it
df_pivot[['Year',1,2,3,4,5,6,7,8,9,10,12]]
#Month  Year      1     2      3      4   5   6   7   8   9  10    12
#0      2021    NaN   NaN    NaN    NaN NaN NaN NaN NaN NaN NaN  77.0
#1      2022  234.0  12.0  432.0  888.0 NaN NaN NaN NaN NaN NaN   NaN

score 0 · Answer 2 · answered Jul 05 '23 at 02:28

Try with this,

import pandas as pd

data2 = {'Year': ['2021', '2021', '2022', '2022', '2022', '2022'],
        'Month': ['11', '12','1','2','3','4'],
        'Data': ['45','77','234','12','432','888']}
orig_table = pd.DataFrame(data2)

new_table = orig_table.pivot(index='Year', columns='Month', values='Data').reset_index()

# Renaming the columns
new_table.columns.name = None

# Filling missing values
new_table = new_table.fillna('')

print(new_table)

Thank you! The "Renaming the columns" line was quite helpful. — phyqrk, Jul 05 '23 at 08:05

How to rearrange a dataframe so one column is extracted and used as column headings

2 Answers2