0

I want to use df.apply to a DataFrame column.

The df represents hierarchical biological classifications.

Dataframe for biological data

I want to show the relevant classification depending on if the data exists within the column. I have written a function that should do this:

def condition(data):
    for i in range(len(data)):
        if data.G[i] and data.Taxonomy[i]:
            return(data.G[i] + " " +data.Taxonomy[i])
        elif data.G[i] and not data.Taxonomy[i]:
            return(data.G[i])
        elif not data.G[i] and not data.Taxonomy[i]:
            return(data.F[i])
        elif data.O[i] and not data.G[i] and not data.Taxonomy[i]:
            return(data.O[i])
        elif data.C[i] and not data.O[i] and not data.G[i] and not data.Taxonomy[i]:
            return(data.C[i])
        elif data.P[i] and not data.O[i] and not data.G[i] and not data.Taxonomy[i]:
            return(data.P[i])
        elif data.k[i] and not data.P[i] and not data.O[i] and not data.G[i] and not data.Taxonomy[i]:
            return(data.k[i]) 

I have attempted to apply this function to the dataframe to output an additional column which shows the data after it has gone through condition():

data['name']=data.apply(lambda x: condition(data), axis = 1)

I receive the output of

Output after df.apply

Where the outcome repeats itself instead of applying the function per row.

How can I apply this function so it gives the desired output?

jhealp
  • 1

1 Answers1

0

You must apply lambda on x not the data:

data['name']=data.apply(lambda x: condition(x), axis = 1)

instead of

data['name']=data.apply(lambda x: condition(data), axis = 1)
IoaTzimas
  • 10,538
  • 2
  • 13
  • 30