0

I have a dataframe like this:

office       school      hospital            garage  lab  occupation
blah
                         blahblah
             sometext 

Basically I want to end up with just the occupation column completed by looping through each row and printing the free-text into that column depending on whichever column is completed. Only one of the columns (office, school, hospital, garage and lab) are populated per row. Here's how the above example should look in the end:

office       school      hospital        garage  lab  occupation
blah                                                    blah
                         blahblah                       blahblah    
             sometext                                   sometext

I am aware of the np.where() method but I don't think this can look at this many columns (which is why I put more than 2 columns in the title).

Thanks!

user8322222
  • 489
  • 3
  • 14
  • 28

2 Answers2

1

if there is only one entry in a row:

df[df.columns[:-1]].astype(str).sum(axis=1)

this will get all the columns except the last one, then turn it to string type, then concatenate all the strings for each column together.

Zulfiqaar
  • 603
  • 1
  • 6
  • 12
  • 1
    Thanks for this. I ended up using another solution by rzskhr but I will probably use this down the line as I get more proficient in python as it's elegant one line code. :) – user8322222 Feb 14 '19 at 17:09
  • thanks! you may want to refer to this thread, as it has quite a lot of more generalisable solutions which could be handy later: https://stackoverflow.com/questions/50004529/get-first-non-null-value-per-row – Zulfiqaar Feb 14 '19 at 17:30
1

You can use the fillna feature based on the columns you want to prioritize first, something like:

You can prioritize the columns you want to fill first from, here I have prioritized - [office school hospital garage lab ]

df['occupation'] = df['office']
df['occupation'].fillna(df['school'], inplace=True)
df['occupation'].fillna(df['hospital'], inplace=True)
df['occupation'].fillna(df['garage'], inplace=True)
df['occupation'].fillna(df['lab'], inplace=True)

assuming you are using pandas dataframe.

rzskhr
  • 931
  • 11
  • 10