I'm trying to subset (retrieve a set of rows) a python pandas data frame by using pd.filter with a regex string to identify the columns of interest before performing a subset based on the values in those columns.
For example, this is my mock data frame:
id status status_drug_use drugA drugA_use drugB drugB_use
0 1 analgesic 0 None 1 hypertensive
1 0 analgesic 1 analgesic 1 hypertensive
2 0 analgesic 1 hypertensive 0 None
3 1 analgesic 0 None 1 analgesic
I would like all rows that contain the values in columns drugA_use
or drugB_use
which match the value in status_drug_use
. As per the example, this would return the two rows:
id status status_drug_use drugA drugA_use drugB drugB_use
1 0 analgesic 1 analgesic 1 hypertensive
3 1 analgesic 0 None 1 analgesic
There are a few column name conventions to stick with:
status_drug_use
is always there.- The matching columns (
drugA_use
anddrugB_use
) always follow the template<ANYTHING>_use
.
Alteration
There is a second scenario, one in which I would like to perform a comparison between a user defined string eg analgesic
and the two columns drugA_use
and drugB_use
. This is different from using the content of status_drug_use
.