0

I need to split a large excel file into thirds, then find quartiles of each third. I'm new to python and trying to learn it for work so any help is appreciated. Here is my current code

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_excel(r'C:\Users\1\Desktop\Python\Project\Project1.xlsx')
test = df['Data']
test2 = df.Data.describe()
print(test2)

But I don't know how to only find the describe() of the first third of the data set for example. Any help is appreciated.

MattDMo
  • 100,794
  • 21
  • 241
  • 231
  • based on your selection rule for df, you can slice the data frame info the required data frames - d1,d2,d3, etc, and then apply the function on each slice. what are your selection rules? – simpleApp Apr 18 '21 at 02:53
  • I would like to select the data by value. So any values over 20,000 for example are a section, any values over 10,000 are their own section, then anything above 0 is it's own section. Then I would try to do the describe function on those 3 separate data sets @simpleApp – LPhysics Apr 19 '21 at 01:31
  • pls refer https://stackoverflow.com/questions/33742588/pandas-split-dataframe-by-column-value and for "and" condition refer https://stackoverflow.com/questions/13611065/efficient-way-to-apply-multiple-filters-to-pandas-dataframe-or-series . if this does help, pls let us know. – simpleApp Apr 19 '21 at 02:06

1 Answers1

0

You could try this:

limit = round(df.shape[1] / 3)  # Total number of rows divided by three and rounded

df1 = df['Data'][:limit]  # first third
df2 = df['Data'][limit:limit*2]  # second third
df3 = df['Data'][limit*2:]  # last third

print(df1.describe())
print(df2.describe())
print(df3.describe())
Laurent
  • 12,287
  • 7
  • 21
  • 37