0

I have a df with multiple authors associated with one title and location:

title | location | author 1 | author 2 | author 3
---------------------------------------------------
A     |  US      |  jon smit| johnny   | brad
B     |  Asia    | Kenny lee| None     | None
C     |  Europe  | gutentag | bonjour  | None

And I want output to ignore any None values and look like:

title | location | author   | 
-----------------------------
A     |  US      |  jon smit|
A     |  US      | johnny   | 
A     |  US      | brad     |
B     |  Asia    | Kenny lee| 
C     |  Europe  | gutentag | 
C     |  Europe  | bonjour  | 

Any help would be appreciated!

Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
Brian Guan
  • 193
  • 2
  • 12

1 Answers1

4

Use df.melt with df.replace to replace None values to NaN and df.dropna to drop NaN.

Lastly, use df.sort_values at last to sort the rows on column title:

In [1414]: import numpy as np

In [1415]: x = df.melt(id_vars=['title', 'location'],  value_name='author')[['title', 'location', 'author']].replace('None', np.nan).dropna().sort_values('title')

In [1416]: x
Out[1416]: 
  title location     author
0     A       US   jon smit
3     A       US     johnny
6     A       US       brad
1     B     Asia  Kenny lee
2     C   Europe   gutentag
5     C   Europe    bonjour

OR: If your None values are Nonetype and not strings, you don't need replace.

x = d.melt(id_vars=["title", "location"], value_name="author")[
    ["title", "location", "author"]
].dropna()
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58