3

I got a long single column DataFrame as following table:

Column A
Cell 1
Cell 2
Cell 3
Cell 4
Cell 5
Cell 6
Cell 7
Cell 8

I want to split column A in order with specify row quantity and add to others new columns If I give 2 row quantity for each column

Column A Column B Column C Column D
Cell 1 Cell 3 Cell 5 Cell 7
Cell 2 Cell 4 Cell 6 Cell 8

split long a single column to new adding column by given row quantity.

Tom
  • 33
  • 3
  • 2
    Show a [mcve] of what you have tried so far. Read [ask]. – padaleiana Jan 07 '23 at 13:30
  • Welcome to Stack Overflow. Please take the [tour] to learn how Stack Overflow works and read [ask] on how to improve the quality of your question. Then [edit] your question to include more details. Maybe take a look at what a [mre] is. Also looking at [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391) would also help you. – imxitiz Jan 07 '23 at 13:36

3 Answers3

2

You can use the underlying numpy array to reshape in Fortran order (rows, then columns):

from string import ascii_uppercase

N = 2

out = (pd.DataFrame(df['Column A'].to_numpy().reshape(N, -1, order='F'))
        # the line below is optional, just to have the column names
         .rename(columns=dict(enumerate(ascii_uppercase))).add_prefix('Column ')
      )

Output:

  Column A Column B Column C Column D
0   Cell 1   Cell 3   Cell 5   Cell 7
1   Cell 2   Cell 4   Cell 6   Cell 8

If you want to handle N that are non multiples of len(df), you can add a reindex step to pad the DataFrame with NaNs:

N = 3

out = (pd.DataFrame(df['Column A'].reindex(range(int(np.ceil(len(df)/N)*N)))
                                  .to_numpy().reshape(N, -1, order='F'))
         .rename(columns=dict(enumerate(ascii_uppercase))).add_prefix('Column ')
      )

Output:

  Column A Column B Column C
0   Cell 1   Cell 4   Cell 7
1   Cell 2   Cell 5   Cell 8
2   Cell 3   Cell 6      NaN
mozway
  • 194,879
  • 13
  • 39
  • 75
1

You could divide the rows into groups and pivot the groups into columns.

Short Version

number_rows = 2
df['cols'] = np.ceil(df['Column A'].expanding().count()/number_rows)
df.index = pd.Series(range(len(df))) % number_rows
df = df.pivot(columns='cols', values='Column A')
cols     1.0     2.0     3.0     4.0
0     Cell 0  Cell 2  Cell 4  Cell 6
1     Cell 1  Cell 3  Cell 5  Cell 7

The code will also work if your number of rows to split by (number_rows) are not a multiple of length of your DataFrame, and missing values (np.nan) will be added.

Long Version with Explanation of Steps

Create demo data

import pandas as pd
import numpy as np
df = pd.DataFrame({'Column A': [f'Cell {i}' for i in range(4)]})
  Column A
0   Cell 0
1   Cell 1
2   Cell 2
3   Cell 3

Create columns for later columns and rows

number_rows = 3 # 3 instead of 2 to illustrate filling with missing values
df['cols'] = np.ceil(df['Column A'].expanding().count()/number_rows)
df['cols'] = 'Column '+df['cols'].astype(int).astype(str)
df['rows'] = pd.Series(range(len(df))) % number_rows
  Column A      cols  rows
0   Cell 0  Column 1     0
1   Cell 1  Column 1     1
2   Cell 2  Column 1     2
3   Cell 3  Column 2     0

Pivot table

df = df.pivot(columns='cols', index='rows', values='Column A')
cols Column 1 Column 2
rows                  
0      Cell 0   Cell 3
1      Cell 1      NaN
2      Cell 2      NaN

You can remove the column and index names with:

df.columns.name = df.index.name = None
Benjamin Ziepert
  • 1,345
  • 1
  • 15
  • 19
1

You should Create a new row, with the name of new columns corresponding to each value and also change the correpoding index. Finally you can use df.pivot() to pivot your dataframe to the new format.

n_rows = 2
df['new_col'] = "Column "
df['new_col']=df['new_col']+pd.Series(df.index%n_rows).ne(1).cumsum().astype(str)
df.index=df.index%n_rows
print(df.pivot(columns='new_col', values='Column A'))

new_col Column 1 Column 2 Column 3 Column 4
0         Cell 1   Cell 3   Cell 5   Cell 7
1         Cell 2   Cell 4   Cell 6   Cell 8
ali bakhtiari
  • 1,051
  • 4
  • 23