How to transpose a column into multiple rows for a column value in python?

Question

If I have a dataframe and it has col1 which is file name, col2 is values that I want transpose into file name with its values for example:

Input:
col1  col2
file1 text_0
file1 text_1
file1 text_2
file2 text_0
file2 text_1
file2 text_2
file2 text_3
file3 text_0

Output:
col1  col2   col3   col4   col5
file1 text_0 text_1 text_2 
file2 text_0 text_1 text_2 text_3
file3 text_0

Are you working with pandas DataFrame? Have you read [this](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html)? I think it does exactly what you want. — mrEvgenX, Dec 19 '19 at 05:22
@mrEvgenX df.pivot(index='col1',columns='col2') Value error Index contains duplicate entries, cannot reshape — , Dec 19 '19 at 05:33
Can you explain this a bit more? I’m not sure I understand the operation you’re trying to perform. — AMC, Dec 19 '19 at 05:46
@AMC reshape or Transpose values for the file. So filename then the values into the same row. — , Dec 19 '19 at 05:50
@s_col What for? Can you share more about the program as a whole? In any case, can’t you just `groupby`? — AMC, Dec 19 '19 at 05:57
@s_col Yes, except that instead of using `nth`, you could probably concat the resulting series to make the rows of a DataFrame. There are certainly a few ways of doing this. I’m a bit weary of creating so many columns in this way, however. When I’ve encountered similar situations in the past, the right solution was almost always to use a different data structure. — AMC, Dec 19 '19 at 07:01
Mines the best ;-) just a little humour since i added the first non pivot solution, lol. I hope you like it s_col — oppressionslayer, Dec 19 '19 at 07:26

score 1 · Answer 1 · answered Dec 19 '19 at 05:27

1

It seems like you have DataFrames, meaning you are using Pandas. Consider checking pandas.transpose or pandas.pivot, depending on what exactly do you need.

answered Dec 19 '19 at 05:27

Simon Osipov

404
5
18

df.pivot(index='col1',columns='col2') Value error Index contains duplicate entries, cannot reshape – Dec 19 '19 at 05:34

jezrael · Accepted Answer · 2019-12-19T07:36:16.133

First idea is use GroupBy.cumcount for counter of duplicated values of col1 for new columns names and reshape by Series.unstack:

df = (df.set_index(['col1',df.groupby('col1').cumcount()])['col2']
        .unstack(fill_value='')
        .reset_index())
df.columns = [f'col{x}' for x in range(1, len(df.columns) + 1)]
print (df)
    col1    col2    col3    col4    col5
0  file1  text_0  text_1  text_2        
1  file2  text_0  text_1  text_2  text_3
2  file3  text_0

Or create Series of list and avoid use apply(pd.Series), because slow, better is use DataFrame constructor:

s = df.groupby('col1')['col2'].apply(list)
df = pd.DataFrame(s.tolist(), index=s.index).reset_index().fillna('')
df.columns = [f'col{x}' for x in range(1, len(df.columns) + 1)]
print (df)
    col1    col2    col3    col4    col5
0  file1  text_0  text_1  text_2        
1  file2  text_0  text_1  text_2  text_3
2  file3  text_0

Alternative:

s = df.groupby('col1')['col2'].apply(list)

L = [[k] + v for k, v in s.items()]
df = pd.DataFrame(L).fillna('').rename(columns=lambda x: f'col{x+1}')
print (df)
    col1    col2    col3    col4    col5
0  file1  text_0  text_1  text_2        
1  file2  text_0  text_1  text_2  text_3
2  file3  text_0

Once I get the df with col1 with file1 col2 text_0 col3 text_1.... how can I reverse this process with melt or unstack or another method to go back to the original form of the data where col1 was files and col2 was values? — , Dec 20 '19 at 15:31
@s_col - Melt is good here - `dforig = df.melt('col1', var_name='col2').query("value != ''").drop('value', axis=1)` — jezrael, Dec 20 '19 at 15:34

oppressionslayer · Answer 3 · 2019-12-19T07:07:51.687

Try this:

new_df = df.pivot(columns='col1').droplevel(0,axis=1).rename_axis(columns='col1').apply(lambda x: pd.Series(x.dropna().values)).fillna('')
new_df.index = new_df.reset_index(drop=True).index+2
new_df = new_df.T.add_prefix('col_')

output:

        col_2   col_3   col_4   col_5
col1                                 
file1  text_0  text_1  text_2        
file2  text_0  text_1  text_2  text_3
file3  text_0

or the new way you have it:

new_df = df.pivot(columns='col1').droplevel(0,axis=1).apply(lambda x: pd.Series(x.dropna().values)).fillna('')
new_df.index = new_df.index+2
new_df = new_df.T.add_prefix('col_')
new_df = new_df.rename_axis(columns='col1', index=None)

output:

col1    col_2   col_3   col_4   col_5
file1  text_0  text_1  text_2        
file2  text_0  text_1  text_2  text_3
file3  text_0

score 0 · Answer 4 · answered Dec 19 '19 at 07:20

Since OP wants no pivot, here is a no pivot solution:

df = df.groupby('col1')['col2'].agg(list).apply(pd.Series).fillna('')  
df.columns = list(range(2,6))
df = df.add_prefix('col_')
df = df.rename_axis(columns='col1', index=None)

output:

col1    col_2   col_3   col_4   col_5
file1  text_0  text_1  text_2        
file2  text_0  text_1  text_2  text_3
file3  text_0

score 0 · Answer 5 · answered Dec 19 '19 at 07:26

This should do the trick:

df2=df.groupby("col1").agg(lambda x: (dict((f"col{id+2}",val) for id,val in enumerate(list(x)))))
df2=df2["col2"].apply(pd.Series).reset_index()

Output:

    col1    col2    col3    col4    col5
0  file1  text_0  text_1  text_2     NaN
1  file2  text_0  text_1  text_2  text_3
2  file3  text_0     NaN     NaN     NaN

How to transpose a column into multiple rows for a column value in python?

5 Answers5