I work with the following regex function:
def datesearcher(comment):
matches = re.findall(
"""(\d{2}\.Jan.\s\d{4}\sMitarbeiter\s)|(\d{2}\.Feb.\s\d{4}\sMitarbeiter\s)|(\d{2}\.März\s\d{4}\sMitarbeiter\s)
|(\d{2}\.Apr.\s\d{4}\sMitarbeiter\s)|(\d{2}\.Mai\s\d{4}\sMitarbeiter\s)|(\d{2}\.Juni\s\d{4}\sMitarbeiter\s)
|(\d{2}\.Juli\s\d{4}\sMitarbeiter\s)|(\d{2}\.Aug.\s\d{4}\sMitarbeiter\s)|(\d{2}\.Sep.\s\d{4}\sMitarbeiter\s)
|(\d{2}\.Okt.\s\d{4}\sMitarbeiter\s)|(\d{2}\.Nov.\s\d{4}\sMitarbeiter\s)|(\d{2}\.Dez.\s\d{4}\sMitarbeiter\s)""", comment
)
return matches
Basically I try to find dates in a string that are always followed by the same word. An example would be (please excuse the german):
examplestring = "some text at the beginning 18.Jan 2017 Mitarbeiter some more text following or even more and more and more"
This should return:
[(18.Jan 2017,,,,,,,,,,,)]
Afterwards I want to apply it on a pandas table.
df["date"] = df["texts"].apply(datesearcher)
The regex only returns [], even though I tested it with https://regex101.com/ Can anyone help? Thank you!