How to apply python pandas formula to categorise individuals based on their haemoglobin samples

Question

I have the xls data ( Master Sheet) attached in the [![sample xlsx data][1]][1]screenshot here.

As a beginner in Python and Data science , I am trying to read the sheets in xlsx file as below:

Load pandas

import pandas as pd

# Read CSV file into DataFrame df
sheet1, sheet2 = None, None
with pd.ExcelFile("blood_data.xlsx") as reader:
    sheet1 = pd.read_excel(reader, sheet_name='Master sheet')
    sheet2 = pd.read_excel(reader, sheet_name='Another Sheet')


# Show dataframe
print(sheet1)

I am getting all the 110 rows and columns in the output, but I want to filter based on the condition which is shown in green in the screenshot. Hb Male 13-17 and Bilirubinn 0.3-1.2 . If the values are not in range I have to print the name of the candidate. Please help me in applying a multi-column-based filter using panda to achieve my result.

Sample input file: https://github.com/wittymindstech/medical-data-analysis/blob/main/blood_data.xlsx

output: MEHRAWAN, RAKHI etc

Please share sample dataframes (input and desired output) in a way that makes instantiation easy. I'm not typing the text from the screenshot into my console. [start here](https://stackoverflow.com/a/20159305/3620003) — timgeb, Jan 07 '22 at 11:33
SO has quite a few questions about filtering a Pandas dataframe based on multiple criteria. See https://stackoverflow.com/questions/62363144/how-to-filter-a-two-columns-with-specific-values-in-pandas for a start. — rajah9, Jan 07 '22 at 13:36
Does this answer your question? [How to filter a two columns with specific values in pandas](https://stackoverflow.com/questions/62363144/how-to-filter-a-two-columns-with-specific-values-in-pandas) — rajah9, Jan 07 '22 at 13:37

score 0 · Answer 1 · answered Jan 07 '22 at 13:55

You can try something like this -

import pandas as pd

# Read CSV file into DataFrame df
sheet1, sheet2 = None, None
with pd.ExcelFile("blood_data.xlsx") as reader:
    sheet1 = pd.read_excel(reader, sheet_name='Master sheet')

indexes = sheet1[ (sheet1['Gender'] == 'm') & ( sheet1['Hb (g/dl)'] <= 17) &  ( sheet1['Hb (g/dl)'] >= 13) & (sheet1['BILIRUBIN (mg/dL)'] >= 0.3) & (sheet1['BILIRUBIN (mg/dL)'] <= 1.2) ].index
final = sheet1.loc[ sheet1.index.isin(indexes), : ]
print(list(final['NAME ']))

please accept and upvote my answer if it helped – Tomer S Jan 07 '22 at 18:50 — Tomer S, Jan 07 '22 at 18:50

How to apply python pandas formula to categorise individuals based on their haemoglobin samples

Load pandas

1 Answers1