Trying to select data slices in pandas using time stamps as the locations

Question

I start by creating a data frame from my input csv file and I get the right format, but now I need to perform calculations on the V, I, and P columns. I want to split the data using the time stamps.

i.e get the mean for V, I, and P for all the values between Test loop 0 and Test loop 1. I know I can do this using iloc but I am trying to write a script that will work for different log files that might have a different number of entries.

Data frame output

Please let me know if you need any more information, any help/input is appreciated.

[Please don't post images of code (or links to them)](http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-so-when-asking-a-question) — jezrael, Jun 12 '18 at 12:11

jezrael · Accepted Answer · 2018-06-12T12:31:27.040

4

I think need extract first 19 values from column Time and aggregate mean:

df = df.groupby(df['Time'].str[:19]).mean()

If need remove rows with NaNs before:

df = df.dropna()
df = df.groupby(df['Time'].str[:19]).mean()

edited Jun 12 '18 at 12:31

answered Jun 12 '18 at 12:08

jezrael

822,522
95
1,334
1,252

This worked, thank you so much! I didn't know I couldn't post images of code I'm sorry about that! – shady mccoy Jun 12 '18 at 12:14
Try copy text with `4 spaces before`, if there should be problem I can help you. – jezrael Jun 12 '18 at 12:15
Depending on the desired output: my solution would get the mean between test loop i and test loop i+1. @jezrael solution will groupby the time for each second. – Pierre Gourseaud Jun 12 '18 at 12:29
@jezrael I have another question for you, say I wanted to make the column name to be what the power supply is and then underneath that print V, I, P. I'm not sure if it is possible to do in pandas. – shady mccoy Jun 12 '18 at 14:57
@MohitModi - Not sure if understand, do you think `PS` column ? Or think [pivot](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe) ? – jezrael Jun 12 '18 at 15:00
@jezrael Let me step back a little bit, so I want to make a column that says PSX1:VDD and another that says PSX2:VDD2. Underneath each of those columns I want to create 3 columns that have V, I, P. So is that possible to do or can you not create columns underneath a column? – shady mccoy Jun 12 '18 at 15:08
It is called `MultiIndex`, And it is possible. But still not sure if is possible generate it from your data. – jezrael Jun 12 '18 at 15:13
But if want mix columns names with data (strings with numeric), it is possible, but not recommended. Pandas working best if each column have same type of data – jezrael Jun 12 '18 at 15:14
Great thanks. I'll look into it and try to implement it, if it's not possible it's not a big deal. – shady mccoy Jun 12 '18 at 15:15
@MohitModi - Ya, try it. If still problem, try create question here, small friend advice - [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – jezrael Jun 12 '18 at 15:16

Pierre Gourseaud · Answer 2 · 2018-06-13T06:41:11.670

If you want to have the mean for the lines between your 'Test loop' lines:

First, you need to extract the limits of your time windows:

serie_time_limits = df[df['Time'].contains('Test loop')]['Time'].str[:19]
df_data = df[~df['Time'].contains('Test loop')]
df_data['Time'] = df_data['Time'].str[:19]

Then, you can get the mean for each test loop:

means = []
for i in range(len(serie_time_limits)):
    if i==len(serie_time_limits)-1:
        df_window = df_data[(df_data['Time']>=serie_time_limits[i])
    else:
        df_window = df_data[(df_data['Time']>=serie_time_limits[i]) & (df_data['Time']<serie_time_limits[i+1])]
    means.append(df_window[['V', 'I', 'P']].mean())

Trying to select data slices in pandas using time stamps as the locations

2 Answers2