4

I have the foll. dataframe:

AA          AB          AC                  AD              Col_1       Col_2     Col_3    
Northeast   Argentina   Northeast Argentina South America   Corrientes  Misiones        
Northern    Argentina   Northern  Argentina South America   Chaco       Formosa   Santiago Del 

I want to convert it to:

AA          AB          AC                  AD              Col
Northeast   Argentina   Northeast Argentina South America   Corrientes
Northeast   Argentina   Northeast Argentina South America   Misiones        
Northern    Argentina   Northern  Argentina South America   Chaco
Northern    Argentina   Northern  Argentina South America   Formosa
Northern    Argentina   Northern  Argentina South America   Santiago Del 

i.e. I want to preserve the first 4 columns but assign each of the remaining column values into a separate row. Is there a way to accomplish this without using a for loop?

user308827
  • 21,227
  • 87
  • 254
  • 417

2 Answers2

3

You can try this:

df = df.melt(id_vars=['AA','AB','AC','AD']) 
df.dropna(inplace=True)
df.drop(columns='variable', inplace=True) 
df = df.sort_values('AA').reset_index(drop=True)
df.rename(columns={'value':'Col'}, inplace=True)

          AA         AB                   AC             AD           Col
0  Northeast  Argentina  Northeast Argentina  South America    Corrientes
1  Northeast  Argentina  Northeast Argentina  South America      Misiones
2   Northern  Argentina   Northern Argentina  South America         Chaco
3   Northern  Argentina   Northern Argentina  South America       Formosa
4   Northern  Argentina   Northern Argentina  South America  Santiago Del

oppressionslayer
  • 6,942
  • 2
  • 7
  • 24
  • You can improve answer with not use `inplace=True` - check [this](https://www.dataschool.io/future-of-pandas/#inplace) – jezrael Dec 21 '19 at 05:30
  • No, I am not pandas dev, but more times see this opinion and agree with it. I think it is more recommendation, check also [this](https://stackoverflow.com/questions/43893457/understanding-inplace-true/59242208#59242208) – jezrael Dec 21 '19 at 05:35
  • 1
    Thanks, this helps me. I appreciate you taking the time to help me become a better pandas expert. – oppressionslayer Dec 21 '19 at 05:38
1

Try using:

df['Col'] = df[['Col_1', 'Col_2', 'Col_3']].values.tolist()
df = df.set_index(df.columns.drop('Col').tolist())['Col'].apply(pd.Series).stack().reset_index().rename(columns={0: 'Col'}).drop(['level_7', 'Col_1', 'Col_2', 'Col_3'], axis=1)
print(df)

Output:

          AA         AB                   AC             AD           Col
0  Northeast  Argentina  Northeast Argentina  South America    Corrientes
1  Northeast  Argentina  Northeast Argentina  South America      Misiones
2   Northern  Argentina   Northern Argentina  South America         Chaco
3   Northern  Argentina   Northern Argentina  South America       Formosa
4   Northern  Argentina   Northern Argentina  South America  Santiago Del
U13-Forward
  • 69,221
  • 14
  • 89
  • 114