2

New to this forum and to coding in general, so I apologize if this is a repeat question, will delete if so!

I'm currently working with pandas in python and attempting to find a mean value within one of my data frame columns.

I've created my dataframe, and called it 'data': data=pd.DataFrame().

The first column is a date-time column and I've set it up as follows, converting it to date-time:

data.iloc[:,0]= pd.to_datetime(data.iloc[:,0], format="%y/%m/%d %H:%M:%S") #yy/mm/dd hh:mm:ss

I've been able to find the mean for an entire column using the .mean() function for my data column of interest:

data["new6"].mean()

However, I'm trying to find the mean for this column but only for a specific date-time range (and the date-time data is in a different column).

Is this possible to do?

Another option is to create a smaller dataframe as a subset of the current dataframe. Here I've named the new dataframe 'data_d.t.specified', and would only include columns within the date-time range desired.

data_d.t.specified = data.drop(data[(0 >= index value here) | 0 <= index value here])

I've tried this using a date-time format but I've had issues with using strings here, so opted for using the index value of the date-time of interest. However, I can't seem to find out the index value for the specific date-time that I want. I've been using the following code, where the date-time column is column 0, but the only thing that it prints is [] .

print(data[data[0]=="20/08/23 13:05:02"].index.values)

Is this an issue with what I'm calling the date-time column? Or have I not converted it to date-time properly? Any help would be greatly appreciated!

Sofia
  • 31
  • 2
  • Maybe try converting your single string in the `==` comparison to a `Datetime` object. Also, I highly recommend reading this [post](https://stackoverflow.com/questions/29370057/select-dataframe-rows-between-two-dates) – Jacob K Jan 15 '21 at 02:44

1 Answers1

1

Since you made it a datetime column you can easily use .loc to focus on a date range:

df.loc[(df['Dates'] >= datetime(2020, 7, 21) & df['Dates'] <= datetime(2021, 1, 14)), 'col_to_mean'].mean()

where:

  • 'Dates' is the name of the column with the dates to .loc on
  • datetime(2020, 7, 21) is a date you want to split on in Year, Month, Day format. You can add time easily. See datetime docs.
  • 'col_to_mean' is the name of the column you want the mean of

Just adjust the logic to be whatever date range you want

noah
  • 2,616
  • 13
  • 27