2

I have just started with python and need some help. I have a dataframe which looks like "Input Data", What I want is stack by every nth column. In other words, I want a dataframe where every nth Column is appended below to first m rows

id city Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 Col 7 Col 8 Col 9 Col 10
1 1 51 155 255 355 455 666 777 955 55 553
2 0 52 155 255 355 455 666 777 595 55 553
3 NAN 53 155 255 355 455 666 777 559 55 535
4 1 54 155 255 355 545 666 777 559 55 535
5 7 55 155 255 355 455 666 777 955 55 535

Required Output

id city Col 1 Col 2 Col 3 Col 4 Col 5
1 1 51 155 255 355 455
2 0 52 155 255 355 455
3 NAN 53 155 255 355 455
4 1 54 155 255 355 545
5 7 55 155 255 355 455
1 1 666 777 955 55 553
2 0 666 777 595 55 553
3 NAN 666 777 559 55 535
4 1 666 777 559 55 535
5 7 666 777 955 55 535

I am trying to do something opposite of this

Input & required Output

SunilG
  • 347
  • 1
  • 4
  • 10
Pinaak
  • 23
  • 4
  • Try `pd.concat(df1, df2)` with `df1 = df[['id','city','Col1','Col2','Col3','Col4','Col5']]` and `df2` the same but Col6 - Col10 (assuming your original DataFrame is called `df`). – Dames Jan 04 '21 at 00:05
  • Thanks @Dames for your reply. But I have total 10K columns (eg- 'Col1','Col2'... 'Col10089') and those needs to result 30 columns (eg- 'Col1','Col2'... 'Col30 )in Output df. It is misplacing the data – Pinaak Jan 04 '21 at 00:09
  • There exists an option `pd.concat(df1, df2, copy=False)` it might work here. Otherwise you could try to get the numpy data by `df[df.columns[2:]].to_numpy()` and manipulate (e.g. reshape) that. You also need to manipulate `df[['id','city']]` to match the manipulation of the numpy object (duplicate every row X amount of times for reshaping). Afterwards turn the numpy object back into a DataFrame and add the id and city data nexto it. – Dames Jan 04 '21 at 00:22

1 Answers1

1
In [74]: column_list = [df.columns[k:k+5] for k in range(2, len(df.columns), 5)]

In [75]: column_list
Out[75]:
[Index(['Col 1', 'Col 2', 'Col 3', 'Col 4', 'Col 5'], dtype='object'),
 Index(['Col 6', 'Col 7', 'Col 8', 'Col 9', 'Col 10'], dtype='object')]

In [76]: dfs = [df[['id', 'city'] + columns.tolist()].rename(columns=dict(zip(columns, range(5)))) for columns in column_list]

In [77]: dfs
Out[77]:
[   id  city   0    1    2    3    4
 0   1   1.0  51  155  255  355  455
 1   2   0.0  52  155  255  355  455
 2   3   NaN  53  155  255  355  455
 3   4   1.0  54  155  255  355  545
 4   5   7.0  55  155  255  355  455,
    id  city    0    1    2   3    4
 0   1   1.0  666  777  955  55  553
 1   2   0.0  666  777  595  55  553
 2   3   NaN  666  777  559  55  535
 3   4   1.0  666  777  559  55  535
 4   5   7.0  666  777  955  55  535]

In [78]: pd.concat(dfs, ignore_index=True)
Out[78]:
   id  city    0    1    2    3    4
0   1   1.0   51  155  255  355  455
1   2   0.0   52  155  255  355  455
2   3   NaN   53  155  255  355  455
3   4   1.0   54  155  255  355  545
4   5   7.0   55  155  255  355  455
5   1   1.0  666  777  955   55  553
6   2   0.0  666  777  595   55  553
7   3   NaN  666  777  559   55  535
8   4   1.0  666  777  559   55  535
9   5   7.0  666  777  955   55  535

To explain :

  1. First generate the required columns for each slice
  2. pd.concat requires the column names of all the dataframes in the list to be the same, hence the renames in rename(columns=dict(zip(columns, range(5)))). We are just renaming the sliced columns to 0,1,2,3,4
  3. Last step is to concat everything.

EDIT

Based on the comments by OP:

Sorry @Asish M. but how to add a column for dataset number in each dataset of dfs, eg- here we split our dataset into 2, so I need one column which says for first 1 to 5 ids - 'first' (or 1), then again for another 1 to 5 ids - 'second' (or 2) in the output. I hope it's making scenes

dfs = [df[['id', 'city'] + columns.tolist()].assign(split_group=idx).rename(columns=dict(zip(columns, range(5)))) for idx, columns in enumerate(column_list)]

df.assign(split_group=idx) creates a column 'split_group' with value = idx. You get the idx from enumerating the column_list

Asish M.
  • 2,588
  • 1
  • 16
  • 31
  • Many Thanks @Asish M. It's a great help indeed. – Pinaak Jan 04 '21 at 08:32
  • Sorry @Asish M. but how to add a column for dataset number in each dataset of dfs, eg- here we split our dataset into 2, so I need one column which says for first 1 to 5 ids - 'first' (or 1), then again for another 1 to 5 ids - 'second' (or 2) in the output. I hope it's making scenes. – Pinaak Jan 04 '21 at 15:43