0

I hope they are fine, how can I divide this single column into n equal parts as seen in the code

df = pd.DataFrame(np.random.randn(30))

enter image description here

I have done this way

First I have made the division, since there are 30 records, it divides it into 3 parts

n = 3  #chunk row size 
list_df = [df[i:i+n] for i in range(0,df.shape[0],n)]

result = pd.concat([list_df[0], 
                list_df[1].reset_index().drop(columns='index'),
                list_df[2].reset_index().drop(columns='index')
                ], axis=1, ignore_index=True, sort=False)

enter image description here

It works fine for the example, but what if I have to divide into more columns, how can I automate it more quickly? Thanks for the support

  • `result = pd.DataFrame(df.to_numpy().reshape(-1, n))` (where `n` is the number of columns) as recommended by [this answer](https://stackoverflow.com/a/56690819/15497888) – Henry Ecker Dec 18 '21 at 01:28
  • Oh ooops, I posted my answer after your closed the question (because I had the tab open before you closed it and I didn't reload the page until a while after), _and_ as a bonus, I misunderstood the question. Nice answer for nothing... :p –  Dec 18 '21 at 01:45
  • 1
    Thank you, that was very easy, it will help me a lot – Geology Modelling by ADT Dec 18 '21 at 01:46
  • Hello, I managed to advance this problem, the code helped me a lot but I had to change n=3, to use n=10 and then transpose "result.T" to get the result, which takes much more time in the processing – Geology Modelling by ADT Dec 20 '21 at 05:21

1 Answers1

2

You can use np.array_split to divide a dataframe (or column) into N equally-sized (as much as possible) portions:

>>> df = pd.DataFrame(np.random.randn(30))
>>> df
           0
0   1.488524
1  -0.288061
... 26 rows omitted ... 
28 -0.052144
29 -0.024019

# Split the dataframe into 2 splits
>>> splits = np.array_split(df, 2)
>>> splits
[           0
 0   1.488524
 1  -0.288061
... 11 rows omitted ...
 13 -0.163669
 14 -0.047295,
 
            0
 15  0.703110
 16 -0.854104
... 11 rows omitted ...
 28 -0.052144
 29 -0.024019]

# Print the length of each split
>>> [len(split) for split in splits]
[15, 15]

# Make 30 splits and print their lengths
>>> splits = np.array_split(df, 30)
>>> [len(split) for split in splits]
[1,
 1,
... 26 rows omitted ...
 1,
 1]

# Make 29 splits
>>> splits = np.array_split(df, 29)
>>> [len(split) for split in splits]
[2,
 1,
... 25 rows omitted ...
 1,
 1]

Note that all of the above will work identically if you pass a column (Series) instead of a DataFrame, e.g. np.array_split(df['your_col'], 2) instead of np.array_split(df, 2).