python reshape every nth column

Question

I have just started with python and need some help. I have a dataframe which looks like "Input Data", What I want is stack by every nth column. In other words, I want a dataframe where every nth Column is appended below to first m rows

id	city	Col 1	Col 2	Col 3	Col 4	Col 5	Col 6	Col 7	Col 8	Col 9	Col 10
1	1	51	155	255	355	455	666	777	955	55	553
2	0	52	155	255	355	455	666	777	595	55	553
3	NAN	53	155	255	355	455	666	777	559	55	535
4	1	54	155	255	355	545	666	777	559	55	535
5	7	55	155	255	355	455	666	777	955	55	535

Required Output

id	city	Col 1	Col 2	Col 3	Col 4	Col 5
1	1	51	155	255	355	455
2	0	52	155	255	355	455
3	NAN	53	155	255	355	455
4	1	54	155	255	355	545
5	7	55	155	255	355	455
1	1	666	777	955	55	553
2	0	666	777	595	55	553
3	NAN	666	777	559	55	535
4	1	666	777	559	55	535
5	7	666	777	955	55	535

I am trying to do something opposite of this

Input & required Output

Try `pd.concat(df1, df2)` with `df1 = df[['id','city','Col1','Col2','Col3','Col4','Col5']]` and `df2` the same but Col6 - Col10 (assuming your original DataFrame is called `df`). — Dames, Jan 04 '21 at 00:05
Thanks @Dames for your reply. But I have total 10K columns (eg- 'Col1','Col2'... 'Col10089') and those needs to result 30 columns (eg- 'Col1','Col2'... 'Col30 )in Output df. It is misplacing the data — Pinaak, Jan 04 '21 at 00:09
There exists an option `pd.concat(df1, df2, copy=False)` it might work here. Otherwise you could try to get the numpy data by `df[df.columns[2:]].to_numpy()` and manipulate (e.g. reshape) that. You also need to manipulate `df[['id','city']]` to match the manipulation of the numpy object (duplicate every row X amount of times for reshaping). Afterwards turn the numpy object back into a DataFrame and add the id and city data nexto it. — Dames, Jan 04 '21 at 00:22

Asish M. · Accepted Answer · 2021-01-04T18:58:55.693

In [74]: column_list = [df.columns[k:k+5] for k in range(2, len(df.columns), 5)]

In [75]: column_list
Out[75]:
[Index(['Col 1', 'Col 2', 'Col 3', 'Col 4', 'Col 5'], dtype='object'),
 Index(['Col 6', 'Col 7', 'Col 8', 'Col 9', 'Col 10'], dtype='object')]

In [76]: dfs = [df[['id', 'city'] + columns.tolist()].rename(columns=dict(zip(columns, range(5)))) for columns in column_list]

In [77]: dfs
Out[77]:
[   id  city   0    1    2    3    4
 0   1   1.0  51  155  255  355  455
 1   2   0.0  52  155  255  355  455
 2   3   NaN  53  155  255  355  455
 3   4   1.0  54  155  255  355  545
 4   5   7.0  55  155  255  355  455,
    id  city    0    1    2   3    4
 0   1   1.0  666  777  955  55  553
 1   2   0.0  666  777  595  55  553
 2   3   NaN  666  777  559  55  535
 3   4   1.0  666  777  559  55  535
 4   5   7.0  666  777  955  55  535]

In [78]: pd.concat(dfs, ignore_index=True)
Out[78]:
   id  city    0    1    2    3    4
0   1   1.0   51  155  255  355  455
1   2   0.0   52  155  255  355  455
2   3   NaN   53  155  255  355  455
3   4   1.0   54  155  255  355  545
4   5   7.0   55  155  255  355  455
5   1   1.0  666  777  955   55  553
6   2   0.0  666  777  595   55  553
7   3   NaN  666  777  559   55  535
8   4   1.0  666  777  559   55  535
9   5   7.0  666  777  955   55  535

To explain :

First generate the required columns for each slice
pd.concat requires the column names of all the dataframes in the list to be the same, hence the renames in rename(columns=dict(zip(columns, range(5)))). We are just renaming the sliced columns to 0,1,2,3,4
Last step is to concat everything.

EDIT

Based on the comments by OP:

Sorry @Asish M. but how to add a column for dataset number in each dataset of dfs, eg- here we split our dataset into 2, so I need one column which says for first 1 to 5 ids - 'first' (or 1), then again for another 1 to 5 ids - 'second' (or 2) in the output. I hope it's making scenes

dfs = [df[['id', 'city'] + columns.tolist()].assign(split_group=idx).rename(columns=dict(zip(columns, range(5)))) for idx, columns in enumerate(column_list)]

df.assign(split_group=idx) creates a column 'split_group' with value = idx. You get the idx from enumerating the column_list

Sorry @Asish M. but how to add a column for dataset number in each dataset of dfs, eg- here we split our dataset into 2, so I need one column which says for first 1 to 5 ids - 'first' (or 1), then again for another 1 to 5 ids - 'second' (or 2) in the output. I hope it's making scenes. — Pinaak, Jan 04 '21 at 15:43

id	city	Col 1	Col 2	Col 3	Col 4	Col 5	Col 6	Col 7	Col 8	Col 9	Col 10
1	1	51	155	255	355	455	666	777	955	55	553
2	0	52	155	255	355	455	666	777	595	55	553
3	NAN	53	155	255	355	455	666	777	559	55	535
4	1	54	155	255	355	545	666	777	559	55	535
5	7	55	155	255	355	455	666	777	955	55	535

id	city	Col 1	Col 2	Col 3	Col 4	Col 5
1	1	51	155	255	355	455
2	0	52	155	255	355	455
3	NAN	53	155	255	355	455
4	1	54	155	255	355	545
5	7	55	155	255	355	455
1	1	666	777	955	55	553
2	0	666	777	595	55	553
3	NAN	666	777	559	55	535
4	1	666	777	559	55	535
5	7	666	777	955	55	535

id	city	Col 1	Col 2	Col 3	Col 4	Col 5	Col 6	Col 7	Col 8	Col 9	Col 10
1	1	51	155	255	355	455	666	777	955	55	553
2	0	52	155	255	355	455	666	777	595	55	553
3	NAN	53	155	255	355	455	666	777	559	55	535
4	1	54	155	255	355	545	666	777	559	55	535
5	7	55	155	255	355	455	666	777	955	55	535

id	city	Col 1	Col 2	Col 3	Col 4	Col 5
1	1	51	155	255	355	455
2	0	52	155	255	355	455
3	NAN	53	155	255	355	455
4	1	54	155	255	355	545
5	7	55	155	255	355	455
1	1	666	777	955	55	553
2	0	666	777	595	55	553
3	NAN	666	777	559	55	535
4	1	666	777	559	55	535
5	7	666	777	955	55	535

python reshape every nth column

1 Answers1

id	city	Col 1	Col 2	Col 3	Col 4	Col 5	Col 6	Col 7	Col 8	Col 9	Col 10
1	1	51	155	255	355	455	666	777	955	55	553
2	0	52	155	255	355	455	666	777	595	55	553
3	NAN	53	155	255	355	455	666	777	559	55	535
4	1	54	155	255	355	545	666	777	559	55	535
5	7	55	155	255	355	455	666	777	955	55	535

id	city	Col 1	Col 2	Col 3	Col 4	Col 5
1	1	51	155	255	355	455
2	0	52	155	255	355	455
3	NAN	53	155	255	355	455
4	1	54	155	255	355	545
5	7	55	155	255	355	455
1	1	666	777	955	55	553
2	0	666	777	595	55	553
3	NAN	666	777	559	55	535
4	1	666	777	559	55	535
5	7	666	777	955	55	535