0

I have a pandas dataframe like so:

df = pd.DataFrame({'column': [[np.nan, np.nan, np.nan], [1, np.nan, np.nan], [2, 3, np.nan], [3, 2, 1]]})

column
0   [nan, nan, nan]
1   [1, nan, nan]
2   [2, 3, nan]
3   [3, 2, 1]

Note that there is never the same value twice in a row.

I wish to transform this single column into multiple columns named with the corresponding values. So I want to order the values and put them in the right column. The ones under column_1, twos under column_2 etc.

    column_1    column_2    column_3
0   NaN NaN NaN
1   1.0 NaN NaN
2   NaN 2.0 3.0
3   1.0 2.0 3.0

How to do this? I don't really know where to start to be honest.

T C Molenaar
  • 3,205
  • 1
  • 10
  • 26
  • Maybe `df = pd.DataFrame(df['column'].to_list(), columns=['column_1','column_2', 'column_3'])`..this question is probably a duplicate, but not sure with which question – AloneTogether Nov 17 '22 at 08:58
  • This does not give the desired output. Note that in the third row, the `2` and `3` need to be shifted to the right – T C Molenaar Nov 17 '22 at 08:59
  • What is the logic behind the shift? You seem to have forgotten that info? – AloneTogether Nov 17 '22 at 09:00
  • I want the `2` in `column_2` and the `3` in `column_3`. As there is no `1` in this row, there needs to be a `nan` there – T C Molenaar Nov 17 '22 at 09:01
  • 1
    Maybe... https://stackoverflow.com/questions/35491274/split-a-pandas-column-of-lists-into-multiple-columns ? – MDR Nov 17 '22 at 09:03
  • similar as above, this does not solve my question. What if in one of the rows, the values were the other way around? How can you sort them into the right column? – T C Molenaar Nov 17 '22 at 09:07
  • Should the values ​​match the names of the columns? – Bushmaster Nov 17 '22 at 09:09
  • 1
    You may want to edit the question with a few examples of different number orders and explain how they need to be handled. – MDR Nov 17 '22 at 09:09
  • @Clegane yes, the values in my dataframe are values 1,...,n and I want to transform them into a dataframe with columns with suffix _1,...,_n – T C Molenaar Nov 17 '22 at 09:17

2 Answers2

1

Use dict comprehension to compute pandas Series of columns:

import math
df = df.apply(lambda row: pd.Series(data={f"column_{v}": v for v in row["column"] if not math.isnan(v)}, dtype="float64"), axis=1)

[Out]:
   column_1  column_2  column_3
0       NaN       NaN       NaN
1       1.0       NaN       NaN
2       NaN       2.0       3.0
3       1.0       2.0       3.0
Azhar Khan
  • 3,829
  • 11
  • 26
  • 32
0

Using a pivot_table:

(df['column'].explode().reset_index()
 .dropna()
 .assign(col=lambda d: 'column_'+d['column'].astype(str))
 .pivot_table(index='index', columns='col', values='column',
              aggfunc='first', dropna=False)
 .reindex(df.index)
)

Output:

col  column_1  column_2  column_3
0         NaN       NaN       NaN
1         1.0       NaN       NaN
2         NaN       2.0       3.0
3         1.0       2.0       3.0
mozway
  • 194,879
  • 13
  • 39
  • 75