0

I have an excel files with four sheets (2016_data, 2017_data, 2018_data and 2019_data). Column C of all these sheets is the weight of a product. I want to use an array such as below that includes the bin breakpoints. breakpoints_1=[1, 300, 1000]

NOTE: The break points array size is not known. Sometimes it could have 10 points, and sometimes just 2 points. (Example: breakpoints_2= [1, 4, 7, 523,650, 1700]

Now I want to classify the data of each of above sheets based on the above array.

If we use breakpoints_1, then the results will be:

2016_data_1=df_2016[(df_2016["weight"]>=1) & (df_2016["weight"]<300)]
2016_data_2=df_2016[(df_2016["weight"]>=300) & (df_2016["weight"]<1000)]

2017_data_1=df_2017[(df_2017["weight"]>=1) & (df_2017["weight"]<300)]
2017_data_2=df_2017[(df_2017["weight"]>=300) & (df_2017["weight"]<1000)]

2018_data_1=df_2018[(df_2018["weight"]>=1) & (df_2018["weight"]<300)]
2018_data_2=df_2018[(df_2018["weight"]>=300) & (df_2018["weight"]<1000)]

2019_data_1=df_2019[(df_2019["weight"]>=1) & (df_2019["weight"]<300)]
2019_data_2=df_2019[(df_2019["weight"]>=300) & (df_2019["weight"]<1000)]
Scott Boston
  • 147,308
  • 15
  • 139
  • 187

1 Answers1

0

Do you actually need to have so many dataframes ? I would suggest you to add a column for the category you want to assign them.

Let's use a sample dataframe and see how it works:

df = pd.DataFrame({'weight': np.random.randint(0, 1500, 150)})

This piece of code will return you 'A' if weight is between 1 and 300, 'B' if between 300 and 1000, else 'C'. The bins argument contains your binning values, right is set to False so that the interval exclude the right bound, and labels contains your labels after applying pandas.cut

pd.cut(df.weight, bins=[1, 300, 1000, np.inf], right=False, labels=['A', 'B', 'C'])
    weight
0   A
1   B
2   A
3   B
4   B

So if you add a weight_category column, you could do conditional computations later on.

df['weight_category'] = pd.cut(df.weight, 
                         bins=[1, 300, 1000, np.inf], 
                         right=False, 
                         labels=['A', 'B', 'C'])
df
    weight  weight_category
0   1404    C
1   885     B
2   612     B
3   90      A
4   1092    C

Hugolmn
  • 1,530
  • 1
  • 7
  • 20
  • Thanks. But I need multiple dataframes. – Pouyan Ebrahimi Jun 22 '20 at 19:42
  • Then you can combine the solution above with a loop as described [here](https://stackoverflow.com/questions/30635145/create-multiple-dataframes-in-loop). You will have to filter based on the assigned category to create your several dataframes – Hugolmn Jun 22 '20 at 19:46