1

I have a dataset in Excel that I would like to replicate.

enter image description here

My python code looks like:

data_frames = [df_mainstore, df_store_A, df_store_B]
df_merged = reduce(lambda  left,right: pd.merge(left,right,on=["Id_number"], how='outer'), data_frames)
print(df_merged)

Since I merge several dataframes (can vary in column number and name) it would be tedious too write out all the columns which is done in this example:

isY = lambda x:int(x=='Y')
countEmail= lambda row: isY(row['Store Contact A']) + isY(row['Store B Contact'])
df['Contact Email'] = df.apply(countEmail,axis=1)

I also struggle with the expression: isY = lambda x:int(x=='@')

How can I add the "Contact has Email" column in a similar way I would do in Excel?

Wizhi
  • 6,424
  • 4
  • 25
  • 47

2 Answers2

1

you can use filter to select columns with Contact in it, then use str.contains with the right pattern for email address and finally you want any per row so:

#data sample
df_merged = pd.DataFrame({'id': [0,1,2,3], 
                          'Store A': list('abcd'),
                          'Store Contact A':['aa@bb.cc', '', 'e', 'f'], 
                          'Store B': list('ghij'),
                          'Store B Contact':['kk@ll.m', '', 'nn@ooo.pp', '']})

# define the pattern as in the link
pat = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
# create the column as wanted
df_merged['Contact has Email'] = df_merged.filter(like='Contact')\
                                          .apply(lambda x: x.str.contains(pat))\
                                          .any(1)

print (df_merged)
   id Store A Store Contact A Store B Store B Contact  Contact has Email
0   0       a        aa@bb.cc       g         kk@ll.m               True
1   1       b                       h                              False
2   2       c               e       i       nn@ooo.pp               True
3   3       d               f       j                              False
Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • 1
    Thank you very much!!!!!!!! Really appreciated. To make it work 100% I needed to adjust that all the word "Contact" start with a capital `C`. Furthermore I used the regex: `[a-zA-Z0-9-_.]+@[a-zA-Z0-9-_.]+` to make it somewhat more dynamic. Really happy for all the help and input :)! – Wizhi May 07 '20 at 18:47
  • 1
    @Wizhi if you not always have capital `C`, you can also use `filter(regex='Contact|contact')` for example, but there are more flexible way I'm sure. Glad it helps anyway :) – Ben.T May 07 '20 at 18:56
  • Might I ask, If I would like to return all the emails instead of True/False value, how should I proceed then? or is it more appropriate to ask a new question :)? – Wizhi May 09 '20 at 14:31
  • 1
    @Wizhi you can try `df_merged['Store Contact A'].str.extract(pat)` where pat is slightly different `pat = r"([a-zA-Z0-9-_.]+@[a-zA-Z0-9-_.]+)"` note the `()` around the regex you did but if you want a working solution, maybe a new question would be better – Ben.T May 09 '20 at 14:55
  • 1
    Thanks for your reply!! I will try to play around with your suggestion. If I don't solve it I will ask a new question. Thanks again!! :)! – Wizhi May 09 '20 at 15:03
  • 1
    @Wizhi not sure how you want the result, but this `df_merged.filter(like='Contact').apply(lambda x: x.str.extract(pat)[0]).agg(list, axis=1)` gives a list of all the email per row – Ben.T May 09 '20 at 16:16
  • 1
    That works really good, exactly what I needed!!, much better than my workaround solution!!! Thanks again :)!! – Wizhi May 09 '20 at 16:49
0

you can use the pandas.Series.str.contains

df_merged['Contact has Email'] = df_merged['Store Contact A'].str.contains('@', na=False)|df_merged['Store B Contact'].str.contains('@', na=False)

Shijith
  • 4,602
  • 2
  • 20
  • 34