1

As a scientific researcher I am a beginner in Python.

I am trying to make a new column in the following dataframe:

                            x      y      z   bat      gradient
date                                                       
2022-04-15 10:17:14.721  0.125  0.016  1.032  NaN    0.0320
2022-04-15 10:17:39.721  0.125 -0.016  1.032  NaN    0.0000
2022-04-15 10:18:04.721  0.125  0.016  1.032  NaN    0.0000
2022-04-15 10:18:29.721  0.125 -0.016  1.032  NaN    0.0000
2022-04-15 10:18:54.721  0.125  0.016  1.032  NaN    0.0160
                       ...    ...    ...  ...       ...
2022-05-02 17:03:04.721 -0.750 -0.016  0.710  NaN    0.7855
2022-05-02 17:03:29.721 -0.750 -0.016  0.710  NaN    1.4420
2022-05-02 17:03:54.721  0.719 -0.302 -0.419  NaN    0.8690
2022-05-02 17:04:19.721 -0.625 -0.048 -0.871  NaN    1.1965
2022-05-02 17:04:44.721 -0.969  0.016 -0.032  NaN    1.2470

And I have certain limits/intervals (whiskers from a boxplot):

limit_start_A = 0.15
limit_end_A = 0.20

limit_start_B =0.20
limit_end_B = 0.40

limit_start_C = 0.40
limit_end_C = 0.90

limit_start_D = 0.90
limit_end_D = 1.1

I would like to make a new column named "result" based on the values that are in the "gradient" column. So when the gradient has a value between the limit/interval of "limit_start_B - limit_start_B" it gives the row in the new "result" column the letter "B".

Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
SimonDL
  • 186
  • 10

1 Answers1

1

Dont use so many variables, rather use a list and pandas.cut:

limits = [0.15, 0.20, 0.40, 0.90, 1.1]
labels = ['A', 'B', 'C', 'D']

df['result'] = pd.cut(df['gradient'], bins=limits, labels=labels)

output:

                             x      y      z  bat  gradient result
date                                                              
2022-04-15 10:17:14.721  0.125  0.016  1.032  NaN    0.0320    NaN
2022-04-15 10:17:39.721  0.125 -0.016  1.032  NaN    0.0000    NaN
2022-04-15 10:18:04.721  0.125  0.016  1.032  NaN    0.0000    NaN
2022-04-15 10:18:29.721  0.125 -0.016  1.032  NaN    0.0000    NaN
2022-04-15 10:18:54.721  0.125  0.016  1.032  NaN    0.0160    NaN
2022-05-02 17:03:04.721 -0.750 -0.016  0.710  NaN    0.7855      C
2022-05-02 17:03:29.721 -0.750 -0.016  0.710  NaN    1.4420    NaN
2022-05-02 17:03:54.721  0.719 -0.302 -0.419  NaN    0.8690      C
2022-05-02 17:04:19.721 -0.625 -0.048 -0.871  NaN    1.1965    NaN
2022-05-02 17:04:44.721 -0.969  0.016 -0.032  NaN    1.2470    NaN
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Thanks for your response, I will try this approach. My question has been closed because it already has been answered. But I think this question does not answer my question. I respect the administrators of this website but I do not understand why this question is similar. – SimonDL May 13 '22 at 09:15
  • 1
    @SimonDL note that you must have continuous ranges here. If they are not you need to construct the bins differently (this is possible). – mozway May 13 '22 at 09:28