12

I have a csv file with measurements, and I want to create a new csv file with the hourly averages and standard deviations. But only for certain columns.

Example:

csv1:

YY-MO-DD HH-MI-SS_SSS    |     Acceleration  |        Lumx     |    Pressure
2015-12-07 20:51:06:608  |        22.7       |        32.3     |     10
2015-12-07 20:51:07:609  |        22.5       |        47.7     |     15

to csv 2 (only for the pressure and acceleration:

 YY-MO-DD HH-MI-SS_SSS       | Acceleration avg  |   Pressure avg
    2015-12-07 20:00:00:000  |        22.6       |        12.5     
    2015-12-07 21:00:00:000  |        ....       |        ....    

Now I have an idea (thanks to the people on this site) on how to calculate the averages - but i'm having trouble on creating a new smaller dataframe that contains the calculations for a few columns.

Thanks !!!

ValientProcess
  • 1,699
  • 5
  • 27
  • 43

3 Answers3

20

You can make a smaller DataFrame like below:

csv2 = csv1[['Acceleration', 'Pressure']].copy()

Then you can handle csv2, which only has the columns you want. (You said you have an idea about avg calculation.)
FYI, .copy() could be omitted if you are sure about view versus copy.

NerdOnTour
  • 634
  • 4
  • 15
su79eu7k
  • 7,031
  • 3
  • 34
  • 40
  • But how can I copy straight from the dataframe after the calculations? – ValientProcess Apr 09 '16 at 14:50
  • @RonMaor You can assign the result like `csv2=pd.DataFrame(); csv2['Acceleration Avg'] = csv1[['Acceleration', 'Pressure']].copy().apply(your_function)` and so on. – su79eu7k Apr 09 '16 at 16:53
6
csv2 = csv1.loc[:, ['Acceleration', 'Pressure']]
  • .loc[] helps keep the subsetting operation explicit and consistent.

  • .loc[] always returns a copy so the original dataframe is never modified.

(for further discussion and great examples of the different view vs. copy alternatives please see: Pandas: Knowing when an operation affects the original dataframe)

leerssej
  • 14,260
  • 6
  • 48
  • 57
1

Your average method can go in place of "method_to_obtain_avg" and then you can obtain a subset as below:

csv2 = csv1.method_to_obtain_avg()[["Accelaration", "Pressure"]]