1

I'm trying to conditionally assign a value to a column using pandas assign.

I tried using pandas assign to make a new column and label it SV if length value specified by the column sv_length is >= 50 and InDel if length is <50.

df3=df2.assign(InDel_SV='InDel' if df2.sv_length < 50 else 'SV')

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

other examples use np.where. Why do I have to use numpy? shouldn't this simple function be part of pandas?

https://chrisalbon.com/python/data_wrangling/pandas_create_column_using_conditional/

Alex Nesta
  • 393
  • 2
  • 13

1 Answers1

0

This syntax is supported through the use of apply.

df3 = df2.assign(
    InDel_SV=df2.sv_length.apply(lambda x: 'InDel' if x < 50 else 'SV'))

However, in the interest of performance, you are recommended to use numpy because apply is a slow convenience function. The pandaic way of doing this is with numpy.where:

df3 = df2.assign(InDel_SV=np.where(df2.sv_length < 50, 'InDel', 'SV'))
cs95
  • 379,657
  • 97
  • 704
  • 746