I want to use df.apply
to a DataFrame column.
The df represents hierarchical biological classifications.
I want to show the relevant classification depending on if the data exists within the column. I have written a function that should do this:
def condition(data):
for i in range(len(data)):
if data.G[i] and data.Taxonomy[i]:
return(data.G[i] + " " +data.Taxonomy[i])
elif data.G[i] and not data.Taxonomy[i]:
return(data.G[i])
elif not data.G[i] and not data.Taxonomy[i]:
return(data.F[i])
elif data.O[i] and not data.G[i] and not data.Taxonomy[i]:
return(data.O[i])
elif data.C[i] and not data.O[i] and not data.G[i] and not data.Taxonomy[i]:
return(data.C[i])
elif data.P[i] and not data.O[i] and not data.G[i] and not data.Taxonomy[i]:
return(data.P[i])
elif data.k[i] and not data.P[i] and not data.O[i] and not data.G[i] and not data.Taxonomy[i]:
return(data.k[i])
I have attempted to apply this function to the dataframe to output an additional column which shows the data after it has gone through condition():
data['name']=data.apply(lambda x: condition(data), axis = 1)
I receive the output of
Where the outcome repeats itself instead of applying the function per row.
How can I apply this function so it gives the desired output?