Pandas: Creating new data frame from only certain columns

Question

I have a csv file with measurements, and I want to create a new csv file with the hourly averages and standard deviations. But only for certain columns.

Example:

csv1:

YY-MO-DD HH-MI-SS_SSS    |     Acceleration  |        Lumx     |    Pressure
2015-12-07 20:51:06:608  |        22.7       |        32.3     |     10
2015-12-07 20:51:07:609  |        22.5       |        47.7     |     15

to csv 2 (only for the pressure and acceleration:

 YY-MO-DD HH-MI-SS_SSS       | Acceleration avg  |   Pressure avg
    2015-12-07 20:00:00:000  |        22.6       |        12.5     
    2015-12-07 21:00:00:000  |        ....       |        ....

Now I have an idea (thanks to the people on this site) on how to calculate the averages - but i'm having trouble on creating a new smaller dataframe that contains the calculations for a few columns.

Thanks !!!

score 20 · Accepted Answer · edited Dec 15 '21 at 01:07

20

You can make a smaller DataFrame like below:

csv2 = csv1[['Acceleration', 'Pressure']].copy()

Then you can handle csv2, which only has the columns you want. (You said you have an idea about avg calculation.)
FYI, .copy() could be omitted if you are sure about view versus copy.

edited Dec 15 '21 at 01:07

NerdOnTour

634
4
15

answered Apr 09 '16 at 14:38

su79eu7k

7,031
3
34
40

But how can I copy straight from the dataframe after the calculations? – ValientProcess Apr 09 '16 at 14:50
@RonMaor You can assign the result like `csv2=pd.DataFrame(); csv2['Acceleration Avg'] = csv1[['Acceleration', 'Pressure']].copy().apply(your_function)` and so on. – su79eu7k Apr 09 '16 at 16:53

leerssej · Answer 2 · 2019-03-06T05:12:28.830

6

csv2 = csv1.loc[:, ['Acceleration', 'Pressure']]

.loc[] helps keep the subsetting operation explicit and consistent.
.loc[] always returns a copy so the original dataframe is never modified.

(for further discussion and great examples of the different view vs. copy alternatives please see: Pandas: Knowing when an operation affects the original dataframe)

edited Mar 06 '19 at 05:12

answered Mar 03 '19 at 01:53

leerssej

14,260
6
48
57

I use this style too but the selected answer works as well. – O.rka Mar 06 '19 at 05:17

score 1 · Answer 3 · answered Aug 02 '22 at 11:02

1

Your average method can go in place of "method_to_obtain_avg" and then you can obtain a subset as below:

csv2 = csv1.method_to_obtain_avg()[["Accelaration", "Pressure"]]

answered Aug 02 '22 at 11:02

Shariq Anwar

91
3

Pandas: Creating new data frame from only certain columns

3 Answers3