Using Multiple Wildcards in Python Pandas

Question

Thanks for helping out. Greatly appreciated. I have looked through S.O. and couldn't quite get the answer i was hoping for.

i have data frame with columns that i would like to sum, but would like to exclude based on wildcard (so am hoping to include based on wildcard but also exclude based on wildcard)

My columns include: "dose_1", "dose_2", "dose_3"... "new_dose" + "infusion_dose_1" + "infusion_dose_2" + many more similarly

I understand if i want to sum using wildcard, i can do

df['new_column'] = df.filter(regex = 'dose').sum(axis = 1)

but what if i want to exclude columns that contains str "infusion"?

Appreciate it!

score 0 · Answer 1 · answered Mar 19 '20 at 18:56

regex probably the wrong tool for this job. Excluding based on a match is overly complicated, see Regular expression to match a line that doesn't contain a word. Just use a list comprehension to select the labels:

df = pd.DataFrame(columns=["dose_1", "dose_2", "dose_3", "new_dose",
                           "infusion_dose_1", "infusion_dose_2", 'foobar'])

cols = [x for x in df.columns if 'dose' in x and 'infusion' not in x]
#['dose_1', 'dose_2', 'dose_3', 'new_dose']

df['new_column'] = df[cols].sum(axis = 1)

Using Multiple Wildcards in Python Pandas

1 Answers1