I have a pandas.Series
with sentences like this:
0 mi sobrino carlos bajó conmigo el lunes
1 juan antonio es un tio guay
2 voy al cine con ramón
3 pepe el panadero siempre se porta bien conmigo
4 martha me hace feliz todos los días
on the other hand, I have a list of names and surnames like this:
l = ['juan', 'antonio', 'esther', 'josefa', 'mariano', 'cristina', 'carlos']
I want to match sentences from the series to the names in the list. The real data is much much bigger than this examples, so I thought that element-wise comparison between the series and the list was not going to be efficient, so I created a big string containing all the strings in the name list like this:
'|'.join(l)
I tried to create a boolean mask that later allows me to index the sentences that contains the names in the name list by true or false value like this:
series.apply(lambda x: x in '|'.join(l))
but it returns:
0 False
1 False
2 False
3 False
4 False
which is clearly not ok.
I also tried using str.contains()
but it doesn't behave as I expect, because this method will look if any substring in the series is present in the name list, and this is not what I need (i.e. I need an exact match).
Could you please point me in the right direction here?
Thank you very much in advance