0

I have a large table with multiple columns as input table in below format:

Col-A   Col-B       Col-C       Col-D   Col-E   Col-F
001     10          01/01/2020  123456  123123  123321
001     20          01/02/2020  123456  123111
002     10          01/03/2020  111000  111123

And I'd like to write a code such that it will show lines per each Col-A and so that instead of multiple columns Col-D,E,F I will only have Col-D:

Col-A   Col-B       Col-C       Col-D   
001     10          01/01/2020  123456  
001     10          01/01/2020  123123  
001     10          01/01/2020  123321
001     20          01/02/2020  123456
001     20          01/02/2020  123111
002     10          01/03/2020  111000
002     10          01/03/2020  111123

Any ideas will be appreciated, Thanks, Nurbek

Tim Stack
  • 3,209
  • 3
  • 18
  • 39
Nurbek Kuantyrov
  • 167
  • 1
  • 11

2 Answers2

1

What about something like this:

df2 = df[["Col-A","Col-B","Col-C","Col-D"]]
columns = ["Col-E","Col-F",...,"Col-Z"]
for col in columns:
    df2.append(df[["Col-A","Col-B","Col-C",col]]).reset_index(drop=True)

You just append the columns you want to your original dataframe

Let's try
  • 1,044
  • 9
  • 20
1

You can use pd.melt

import pandas as pd
newdf = pd.melt(
    df, 
    id_vars=['Col-A', 'Col-B', 'Col-C'],
    value_vars=['Col-D', 'Col-E', 'Col-F']
).dropna()

This will drop 'Col-D', 'Col-E' and 'Col-F', but create two new columns variable and value. Variable column will denote the column from which your value came from. To achieve what you want ultimately, you can drop the variable column and rename the value column to Col-D.

newdf = newdf.drop(['variable'], axis=1)
newdf = newdf.rename(columns={"value":"Col-D"})
najeem
  • 1,841
  • 13
  • 29