I asked a similar question yesterday Keep elements with pattern in pandas series without converting them to list and now I am faced with the opposite problem.
I have a pandas dataframe:
import pandas as pd
df = pd.DataFrame(["Air type:1, Space kind:2, water, wood", "berries, something at the start:4, Space blu:3, somethingelse"], columns = ['A'])
and I want to pick all elements that don't have a ":" in them. What I tried is the following regex which seems to be working:
df['new'] = df.A.str.findall('(^|\s)([^:,]+)(,|$)')
A new
0 Air type:1, Space kind:2, water, wood [( , water, ,), ( , wood, )]
1 berries, something at the start:4, Space blu:3, somethingelse [(, berries, ,), ( , somethingelse, )]
If I understand this correctly, findall searched for 3 patterns (the ones that I have in parenthesis) and returned as many as it could find in tuples wrapped in a list. Is there a way to avoid this and simply return only the middle pattern? As in for the first row: water, wood for the second row: berries, somethingelse
I also tried the opposite approach:
df.A.str.replace('[^\s,][^:,]+:[^:,]+', '').str.replace('\s*,', '')
which seems to be working close to what I want (only the commas between the patterns are missing) but I am wondering if I am missing something that would make my life easier.