I have a following dataframe:
URL_WITH_EMAILS_DF = pd.DataFrame(data=[{'main_url': 'http://keilstruplund.dk', 'emails': ['ole.norlin@mail.dk', 'ole.gregersen@hk.dk', 'prima-rent@youseepost.dk', 'jb@rentind.dk', 'frisoren01@gmail.com','stigterndrup@gmail.com', 'psn@psn.dk', 'samuel@malerfirmaet-lykkebo.dk', 'jan@mundt-reklame.dk', 'nordsjalland@phonixtag.dk', 'jp@rudersdalmaleren.dk', 'vvs@hestetangen.dk', 'steenkragelund@mail.tele.dk', 'kasserer@keilstruplund.dk']},
{'main_url': 'http://kirsebaergaarden.com', 'emails': ['info@kirsebaergaarden.com','ghost1054@yahoo.com']},
{'main_url': 'http://koglernes.dk', 'emails': ['info@koglernes.dk']},
{'main_url': 'http://kongehojensbornehave.dk', 'emails': []}
])
However, I want to keep only those values for property named "emails" whose every element's value after '@' is same as the corresponding value of the 'main_url' property but after "http://" resulting the following data frame:
URL_WITH_EMAILS_DF = pd.DataFrame(data=[{'main_url': 'http://keilstruplund.dk', 'emails': ['kasserer@keilstruplund.dk']},
{'main_url': 'http://kirsebaergaarden.com', 'emails': ['info@kirsebaergaarden.com']},
{'main_url': 'http://koglernes.dk', 'emails': ['info@koglernes.dk']},
{'main_url': 'http://kongehojensbornehave.dk', 'emails': []}
])
Any hints or approach is appreciable considering the fact that I have millions row to implement the transformation