I am a doctor looking at surgical activity in a DataFrame that has 450,000 records and 129 fields or columns. Each row describes a patient admission to hospital, and the columns describe the operation codes and diagnosis codes for the patient.
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 452883 entries, 0 to 452882
Columns: 129 entries, fyear to opertn_24Type
dtypes: float64(5), int64(14), object(110)
memory usage: 445.7+ MB
There are 24 operation columns for each row. I want to search in the operation columns (1-24) for the codes for pituitary surgery "B041" and "B012"
, to identify all patients who have had surgery for a pituitary tumor.
I am a total python beginner and have tried using iloc to describe the range of columns (1-24) which appear starting at position 72 in the list of columns but couldn't get it to work.
I am quite happy searching for individual values eg "B041"
in a single column using
df["OPERTN_01"] == "B041"
but would ideally like to search multiple columns (all the surgical columns 1-24) more efficiently.
I tried searching the whole dataframe using
test = df[df.isin(["B041", "B012"])]
but that just returns the entire dataframe with null values.
So I have a few questions.
- How do I identify integer positions (iloc numbers) for columns in a large dataframe of 129 columns? I just listed them and counted them to get the first surgical column ("OPERTN_01") at position 72 — there must be an easier way.
- What's the best way to slice a dataframe to select records with multiple values from multiple columns?