Python (Pandas) - Working with numeric data but adding non numeric data back

Question

I have a CSV file that looks like this:

Build,Avg,Min,Max
BuildA,56.190,39.123,60.1039
BuildX,57.11,40.102,60.200
BuildZER,55.1134,35.129404123,60.20121

I want to get the average, min, max of each column and have each of these stats be as a new row. I exclude the non numeric column (the build column) and then run the statistics. I accomplish this by doing:

df = pd.read_csv('fakedata.csv')
columns = []
builds = []

for column in df.columns:
    if(df[column].dtype == 'float64'):
        columns.append(column)
    else:
        builds.append(column)

save = df[builds]
df = df[columns]

print(df)

df.loc['Min']= df.min()
df.loc['Average']= df.mean()
df.loc['Max']= df.max()

If I were then to write this data to a CSV it would look like:

,Avg,Min,Max
0,56.19,39.123,60.1039
1,57.11,40.102,60.2
2,55.1134,35.129404123,60.20121
Min,55.1134,35.129404123,60.1039
Average,55.8817,37.3709520615,60.1522525
Max,57.11,40.102,60.20121

Which is close to what I want but I want the Build column to be column one again and have the build names exist on top of the Min, Average, Max. Basically this:

Builds,Avg,Min,Max
BuildA,56.19,39.123,60.1039
BuildX,57.11,40.102,60.2
BuildZER,55.1134,35.129404123,60.20121
Min,55.1134,35.129404123,60.1039
Average,55.8817,37.3709520615,60.1522525
Max,57.11,40.102,60.20121

I have attempted to accomplish this by doing:

df.insert(0,'builds', save)
with open('fakedata.csv', 'w') as f:
    df.to_csv(f)

But this gives me this CSV:

,builds,Avg,Min,Max
0,Build1,56.19,39.123,60.1039
1,Build2,57.11,40.102,60.2
2,Build3,55.1134,35.129404123,60.20121
Min,,55.1134,35.129404123,60.1039
Average,,55.8817,37.3709520615,60.1522525
Max,,57.11,40.102,60.20121

How can I fix this?

This is very close but it results in a csv where the build names are present but now the Min, Average, Max are absent. I think the easiest way to fix this would be to add "Min", "Average", and "Max" to the save dataframe. I can't find an easy way to do this however — Abdall, Jul 05 '17 at 15:08

score 1 · Accepted Answer · answered Jul 05 '17 at 15:01

1

IIUC:

df_out = pd.concat([df.set_index('Build'),df.set_index('Build').agg(['max','min','mean'])]).rename(index={'max':'Max','min':'Min','mean':'Average'}).reset_index()

Output:

      index      Avg        Min       Max
0    BuildA  56.1900  39.123000  60.10390
1    BuildX  57.1100  40.102000  60.20000
2  BuildZER  55.1134  35.129404  60.20121
3       Max  57.1100  40.102000  60.20121
4       Min  55.1134  35.129404  60.10390
5   Average  56.1378  38.118135  60.16837

answered Jul 05 '17 at 15:01

Scott Boston

147,308
15
139
187

This would be the desired result but adding this line results in me getting a Key Error on "Build." I believe this is because to ensure I am only working with numeric data I do df = df[columns] where columns is equal to: for column in df.columns: if(df[column].dtype == 'float64'): columns.append(column) reference above for better formatting – Abdall Jul 05 '17 at 15:07
There is a [`select_dtypes`](http://pandas.pydata.org/pandas-docs/version/0.19.2/generated/pandas.DataFrame.select_dtypes.html#pandas-dataframe-select-dtypes) that will "Return a subset of a DataFrame including/excluding columns based on their dtype." – Scott Boston Jul 05 '17 at 15:10
I realize that your answer was actually just a one line substitute to everything I was doing. – Abdall Jul 05 '17 at 15:16
Your answer works but if you have extra time if you could explain what the code is doing I would greatly appreciate it. I am specifically confused as to where the min, max, avg are being computed and how the concat works. – Abdall Jul 05 '17 at 15:17
1

Okay, so I have the original dataframe df, I set index to move the build in to index next, I use df.agg to perform multiple aggregations returns a dataframe with the names of the aggregrations in the index, so then I use concat to combine these two dataframes one on top of the other. Next I rename index labels to get your desired result and reset_index to move that index back in to columns. – Scott Boston Jul 05 '17 at 15:20
You can take parts of that code out an run it bit by bit to see what each section does. – Scott Boston Jul 05 '17 at 15:21

Python (Pandas) - Working with numeric data but adding non numeric data back

1 Answers1

Linked