I have a large (>30M rows) dataframe and I need to create a bunch of columns based on conditions and values of other columns. I have done this before with apply
and map
method but it is very inefficient and slow to use it on an entire dataframe this large. I am looking for alternatives that is faster and more scalable.
Here is a header of the dataframe
2019_date | Carrier | Service_y | ship_from_location
2019-12-17 | USPS | PM | ECFC
and the code I tried:
def cut_off(row):
if (row['2019_date']>='2019-12-17' and row['Carrier']=='USPS' and row['Service_y']=='FCPS'):
return 'disable'
if (row['2019_date']>='2019-12-19' and row['Carrier']=='USPS' and row['Service_y']=='PM'):
return 'disable'
if (row['2019_date']>='2019-12-19' and row['ship_from_location']=='ECFC' and row['Carrier']=='UDSL'):
return 'disable'
if (row['2019_date']>='2019-12-19' and row['ship_from_location']=='MWFC' and row['Carrier']=='EMSY'):
return 'disable'
if (row['2019_date']>='2019-12-19' and row['ship_from_location']=='ECFC' and row['Carrier']=='LASG'):
return 'disable'
if (row['2019_date']>='2019-12-19' and row['ship_from_location']=='beauty_659' and row['Carrier']=='LASG'):
return 'disable'
if (row['2019_date']>='2019-12-19' and row['ship_from_location']=='RDR_699' and row['Carrier']=='LASG'):
return 'disable'
if (row['2019_date']>='2019-12-22' and row['ship_from_location']=='ECFC' and row['Carrier']=='CDDT'):
return 'disable'
if (row['2019_date']>='2019-12-22' and row['ship_from_location']=='beauty_659' and row['Carrier']=='CDDT'):
return 'disable'
if (row['2019_date']>='2019-12-22' and row['ship_from_location']=='RDR_699' and row['Carrier']=='CDDT'):
return 'disable'
if (row['Normalized_Service'] in (['3D', '1D', '2D']) and row['ship_from_location']=='beauty_659' and row['Carrier']!='UPSN'):
return 'disable'
if (row['Normalized_Service'] in (['3D', '1D', '2D']) and row['ship_from_location']=='beauty_489' and row['Carrier']!='UPSN'):
return 'disable'
else:
return 'eligible'
dataframe['eligibility'] = dataframe.apply (lambda row: cut_off (row),axis=1)