I'm doing data wrangling on Python, using the package dfply.
I want to create a new variable "a06", from 'FC06' of the dataset data_a, so that :
- a06 = 1 if FC06[i] starts with the character "1" (ex : FC06[i]=173)
- a06 = 2 if FC06[i] starts with the character "2"
- a06 = NaN if FC06[i] = NaN
For instance, with the input :
df = pd.DataFrame({'FC06':[173,170,220,float('nan'),110,230,float('nan')]})
I want to get the output :
df1= pd.DataFrame({'a06':[1,1,2,float('nan'),1,2,float('nan')]})
On R it would be obtained by :
data_a %>% mutate(a06 = ifelse(substr(FC06,1,1)=="1",1,ifelse(substr(FC06,1,1)=="1",2,NaN)))
but I don't find how to do this with Python.
I achieved a first version with just 2 alternatives : NaN or 1, with :
data_a >> mutate(a06=if_else((X['FC06'].apply(pd.isnull)),float('nan'),1)
but I can't find how to differentiate the result according to the first character of FC06.
(I tried things like :
(data_a >> mutate(a06=if_else(X['FC06'].apply(pd.isnull),float('nan'),if_else(X['FC06'].apply(str)[0]=='1',1,2))))
but without success : - [0] doesn't work there to get the first character - and/or str() can't be used with apply (neither str.startswith('1'))
Does anybody knows how to solve such situations ?
Or another package to do that on Python ?
Thank you !!