0

I'm relatively new to Python and I encountered an issue I can't seem to solve. I have imported an excel sheet in Python - it's full of Timestamps and corresponding glucose values.

I got my code to display timestamp-glucose pairs for a specific timeframe with the purpose of being able to analyze separate chunks from the data. So now I can just specify that I only want the data from the afternoon or the morning or X day before I hit 'Run'.

I want to run some basic calculations. I want to be able to enter the time-range and then get average glucose ONLY for the specified time period, but I'm struggling. I get the average glucose of the whole data by simply having this line:

print(df['Historic Glucose mmol/L'].mean())

But when it comes to getting averages for the specified time period I'm not sure how to do it. I looked into questions on here but couldn't find similar ones. Additionally, I have looked at possible numpy functions but I don't think they would help. If anyone has any suggestions I'd be grateful. Below is a chunk of code that gets me the specified timeframes:

dataSubSection = df
sDate = datetime(2019,11,21,17,17,00)
eDate = datetime(2019,11,22,00,00,0)
start_date = sDate.strftime('%Y-%m-%d %H:%M:%S')
end_date = eDate.strftime('%Y-%m-%d %H:%M:%S')
AMC
  • 2,642
  • 7
  • 13
  • 35
  • 1
    You appear to be using Pandas, is that correct? Have you seen https://pandas.pydata.org/docs/user_guide/timeseries.html and [Select DataFrame rows between two dates](https://stackoverflow.com/questions/29370057/select-dataframe-rows-between-two-dates) ? – AMC Apr 05 '20 at 20:41
  • @AMC Thank you for this, I found some useful information in your second link and managed to solve the issue! – Stefani Dimitrova Apr 06 '20 at 15:14
  • I’m voting to close this question because the OP has resolved the issue, as stated in the comments. – Trenton McKinney May 01 '20 at 21:59

1 Answers1

0

What is the package that you are using to import the dataset?

Given the information provided, this is how I would approach it

I would use pandas and do pandas.read_excel() (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html)

and once the dataset is in pandas, I would do something like this to calculate the mean on specific rows (How to get mean of rows selected with another column's values in pandas)

If your dataset is large or you anticipate it to get larger with time, I think it would be worth the effort to start looking into pyspark.

Good luck!

user3693309
  • 343
  • 4
  • 14
  • Hi, I am indeed using pandas. I managed to solve the problem with the upper comment but I ended up looking into pyspark and I think it'll come in handy with the rest of my work so thank you for pointing is out to me. – Stefani Dimitrova Apr 06 '20 at 15:15