Pandas change the dataframe shape column values to row

Question

I have a dataframe which is looking as below

df

     school1    game1   game2   game3
     school2    game1       
     school3    game2   game3   
     school4            game2

output

       game1    school1 school2 
       game2    school1 school4 school3
       game3    school1         school3

can any one suggest me how can I get the out put i am new to pandas please help me

Thank you

Does it matter which order things appear? Also game 1 only has two observations not three like in your output — ALollz, Apr 12 '21 at 16:54

AlexK · Answer 1 · 2021-04-12T21:06:42.947

Here is one way that relies on the melt() method to first make a long table out of the original and then on the pivot() method to transform it to the new wide format:

import pandas as pd
import numpy as np

# Code that creates your input dataframe (replace column names as needed)
df = pd.DataFrame(
    {'school':['school1', 'school2', 'school3', 'school4'],
     'g1':['game1', 'game1', 'game2', None],
     'g2':['game2', None, 'game3', 'game2'],
     'g3':['game3', None, None, None],
    }
)

# Convert to long format (one row per school-game)
long_df = df.set_index('school').melt(ignore_index=False).reset_index()
# Remove null (non-existing) school-game combinations
# Also, convert index to column for next step
long_df = long_df[long_df.value.notnull()].reset_index(drop=True).reset_index()

# Convert to dataframe with one row per game ID
by_game_df = long_df.pivot(index='value',columns='index',values='school')

At this point, the dataframe will look like this:

index  value        0        1        2        3        4        5        6
0      game1  school1  school2      NaN      NaN      NaN      NaN      NaN
1      game2      NaN      NaN  school3  school1      NaN  school4      NaN
2      game3      NaN      NaN      NaN      NaN  school3      NaN  school1

You can perform these additional steps to shift non-null school values to the left and to remove columns with only NaN's remaining:

# per https://stackoverflow.com/a/65596853:
idx = pd.isnull(by_game_df.values).argsort(axis=1)
squeezed_df = pd.DataFrame(
    by_game_df.values[np.arange(by_game_df.shape[0])[:,None], idx],
    index=by_game_df.index,
    columns=by_game_df.columns
)
result = squeezed_df.dropna(axis=1, how='all')
result
# index  value        0        1        2
# 0      game1  school1  school2      NaN
# 1      game2  school3  school1  school4
# 2      game3  school3  school1      NaN

score 1 · Answer 2 · answered Apr 12 '21 at 22:56

Or with a Series of lists and a much maligned loop:

d = {'School': ['s1','s2','s3','s4'], 'c1': ['g1','g1','g2',np.nan], 'c2': ['g2',np.nan,'g3','g2'], 'c3': ['g3',np.nan,np.nan,np.nan]}
df = pd.DataFrame(d)
df

  School   c1   c2   c3
0     s1   g1   g2   g3
1     s2   g1  NaN  NaN
2     s3   g2   g3  NaN
3     s4  NaN   g2  NaN

gg = pd.Series(dtype=object)

def add_gs(game, sch):
    if type(game) is str:
        if game in gg.keys():
            gg[game] += [sch]
        else:
            gg[game] = [sch]

cols = df.filter(regex='c[0-9]').columns

for i in range(len(df)):
    for col in cols:
        add_gs(df.loc[i,col],df.loc[i,'School'])

gg

g1        [s1, s2]
g2    [s1, s3, s4]
g3        [s1, s3]

score 1 · Answer 3 · answered Apr 16 '21 at 21:55

A solution that relies on defaultdict() for reshaping the data:

from collections import defaultdict
import pandas as pd

# Code that creates your input dataframe (replace column names as needed)
df = pd.DataFrame(
    {'school':['school1', 'school2', 'school3', 'school4'],
     'g1':['game1', 'game1', 'game2', None],
     'g2':['game2', None, 'game3', 'game2'],
     'g3':['game3', None, None, None],
    }
)

# convert df to dictionary
d = df.set_index('school').to_dict(orient='index')

# reshape the dictionary
def_d = defaultdict(list)
for k, v in d.items():
    for i in v.values():
        if i is not None:
            def_d[i].append(k)
d_rs = dict(def_d)

# prepare dictionary for converting back to dataframe
dict_for_df = {
    k: pd.Series(
        v + [None] * (len(max(d_rs.values(), key=lambda x: len(x))) - len(v))
    ) for k, v in d_rs.items()
}

# convert dictionary to dataframe
final_df = pd.DataFrame.from_dict(dict_for_df, orient='index')
}
#              0        1        2
# game1  school1  school2     None
# game2  school1  school3  school4
# game3  school1  school3     None

Pandas change the dataframe shape column values to row

3 Answers3