3

If I have a dataframe and it has col1 which is file name, col2 is values that I want transpose into file name with its values for example:

Input:
col1  col2
file1 text_0
file1 text_1
file1 text_2
file2 text_0
file2 text_1
file2 text_2
file2 text_3
file3 text_0

Output:
col1  col2   col3   col4   col5
file1 text_0 text_1 text_2 
file2 text_0 text_1 text_2 text_3
file3 text_0 
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • Are you working with pandas DataFrame? Have you read [this](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html)? I think it does exactly what you want. – mrEvgenX Dec 19 '19 at 05:22
  • @mrEvgenX df.pivot(index='col1',columns='col2') Value error Index contains duplicate entries, cannot reshape –  Dec 19 '19 at 05:33
  • Can you explain this a bit more? I’m not sure I understand the operation you’re trying to perform. – AMC Dec 19 '19 at 05:46
  • @AMC reshape or Transpose values for the file. So filename then the values into the same row. –  Dec 19 '19 at 05:50
  • @s_col What for? Can you share more about the program as a whole? In any case, can’t you just `groupby`? – AMC Dec 19 '19 at 05:57
  • @AMC df.groupby('col1')['col2'].nth(0) etc??? –  Dec 19 '19 at 06:02
  • 1
    @s_col Yes, except that instead of using `nth`, you could probably concat the resulting series to make the rows of a DataFrame. There are certainly a few ways of doing this. I’m a bit weary of creating so many columns in this way, however. When I’ve encountered similar situations in the past, the right solution was almost always to use a different data structure. – AMC Dec 19 '19 at 07:01
  • 1
    I added a no pivot solution. 2nd Answer. I hope this helps – oppressionslayer Dec 19 '19 at 07:21
  • Mines the best ;-) just a little humour since i added the first non pivot solution, lol. I hope you like it s_col – oppressionslayer Dec 19 '19 at 07:26

5 Answers5

1

It seems like you have DataFrames, meaning you are using Pandas. Consider checking pandas.transpose or pandas.pivot, depending on what exactly do you need.

Simon Osipov
  • 404
  • 5
  • 18
  • df.pivot(index='col1',columns='col2') Value error Index contains duplicate entries, cannot reshape –  Dec 19 '19 at 05:34
1

First idea is use GroupBy.cumcount for counter of duplicated values of col1 for new columns names and reshape by Series.unstack:

df = (df.set_index(['col1',df.groupby('col1').cumcount()])['col2']
        .unstack(fill_value='')
        .reset_index())
df.columns = [f'col{x}' for x in range(1, len(df.columns) + 1)]
print (df)
    col1    col2    col3    col4    col5
0  file1  text_0  text_1  text_2        
1  file2  text_0  text_1  text_2  text_3
2  file3  text_0                        

Or create Series of list and avoid use apply(pd.Series), because slow, better is use DataFrame constructor:

s = df.groupby('col1')['col2'].apply(list)
df = pd.DataFrame(s.tolist(), index=s.index).reset_index().fillna('')
df.columns = [f'col{x}' for x in range(1, len(df.columns) + 1)]
print (df)
    col1    col2    col3    col4    col5
0  file1  text_0  text_1  text_2        
1  file2  text_0  text_1  text_2  text_3
2  file3  text_0                        

Alternative:

s = df.groupby('col1')['col2'].apply(list)

L = [[k] + v for k, v in s.items()]
df = pd.DataFrame(L).fillna('').rename(columns=lambda x: f'col{x+1}')
print (df)
    col1    col2    col3    col4    col5
0  file1  text_0  text_1  text_2        
1  file2  text_0  text_1  text_2  text_3
2  file3  text_0                        
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    yes works perfect –  Dec 20 '19 at 04:52
  • Once I get the df with col1 with file1 col2 text_0 col3 text_1.... how can I reverse this process with melt or unstack or another method to go back to the original form of the data where col1 was files and col2 was values? –  Dec 20 '19 at 15:31
  • 1
    @s_col - Melt is good here - `dforig = df.melt('col1', var_name='col2').query("value != ''").drop('value', axis=1)` – jezrael Dec 20 '19 at 15:34
0

Try this:

new_df = df.pivot(columns='col1').droplevel(0,axis=1).rename_axis(columns='col1').apply(lambda x: pd.Series(x.dropna().values)).fillna('')
new_df.index = new_df.reset_index(drop=True).index+2
new_df = new_df.T.add_prefix('col_')


output:

        col_2   col_3   col_4   col_5
col1                                 
file1  text_0  text_1  text_2        
file2  text_0  text_1  text_2  text_3
file3  text_0 

or the new way you have it:

new_df = df.pivot(columns='col1').droplevel(0,axis=1).apply(lambda x: pd.Series(x.dropna().values)).fillna('')
new_df.index = new_df.index+2
new_df = new_df.T.add_prefix('col_')
new_df = new_df.rename_axis(columns='col1', index=None) 

output:

col1    col_2   col_3   col_4   col_5
file1  text_0  text_1  text_2        
file2  text_0  text_1  text_2  text_3
file3  text_0   
oppressionslayer
  • 6,942
  • 2
  • 7
  • 24
0

Since OP wants no pivot, here is a no pivot solution:

df = df.groupby('col1')['col2'].agg(list).apply(pd.Series).fillna('')  
df.columns = list(range(2,6))
df = df.add_prefix('col_')
df = df.rename_axis(columns='col1', index=None) 

output:

col1    col_2   col_3   col_4   col_5
file1  text_0  text_1  text_2        
file2  text_0  text_1  text_2  text_3
file3  text_0                        
oppressionslayer
  • 6,942
  • 2
  • 7
  • 24
0

This should do the trick:

df2=df.groupby("col1").agg(lambda x: (dict((f"col{id+2}",val) for id,val in enumerate(list(x)))))
df2=df2["col2"].apply(pd.Series).reset_index()

Output:

    col1    col2    col3    col4    col5
0  file1  text_0  text_1  text_2     NaN
1  file2  text_0  text_1  text_2  text_3
2  file3  text_0     NaN     NaN     NaN
Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34