Pandas - Assign string values based on multiple ranges

Question

I have created a small function to assign a string value to a column based on ranges from another column ie: 3.2 == '0-6m', 7 == '6-12m' But I am getting this error: TypeError: 'float' object is not subscriptable

Dataframe

  StartingHeight
         4.0
         3.2
         8.0
        32.0
        12.0
        18.3

Expected output:

   StartingHeight height_factor
         4.0          0-6m
         3.2          0-6m
         8.0         6-12m
        32.0          >30m
        12.0         6-12m
        18.3        18-24m

Code:

    def height_bands(hbcol):
    """Apply string value based on float value ie: 6.2 == '6-12m
        hb_values = ['0-6m', '6-12m', '12-18m', '18-24m', '24-30m', '>30m']"""

    if (hbcol['StartingHeight'] >= 0) | (hbcol['StartingHeight'] < 6.1):
        return '0-6m'
    elif (hbcol['StartingHeight'] >= 6.1) | (hbcol['StartingHeight'] < 12):
        return '6-12m'
    elif (hbcol['StartingHeight'] >= 12) | (hbcol['StartingHeight'] < 18):
        return '12-18m'
    elif (hbcol['StartingHeight'] >= 18) | (hbcol['StartingHeight'] < 24):
        return '18-25m'
    else:
        return '>30m'


df1['height_factor'] = df1.apply(lambda x: height_bands(x['StartingHeight']), axis=1)

Thanks for your help!

You call the function as `height_bands(x['StartingHeight'])` you've already selected the column. so `hbcol` is a float hence the error calling `hbcol['StartingHeight']` which is the equivalent of either `x['StartingHeight']['StartingHeight']` or `4.0['StartingHeight']` depending on how you want to look at it. — Henry Ecker, Sep 10 '21 at 22:21
As an aside to this specific error, I believe you're looking for [Binning a column with Python Pandas](https://stackoverflow.com/q/45273731/15497888) `df1['height_factor'] = pd.cut(df1['StartingHeight'], bins=[0, 6.1, 12, 18, 24, np.inf], labels=['0-6m', '6-12m', '12-18m', '18-25m', '>30m'], right=False)` — Henry Ecker, Sep 10 '21 at 22:23
You're also going to want `and` not `|` because your conditions currently will include the entire number line since __all__ numbers are _either_ more than 0 _or_ less than 6.1 — Henry Ecker, Sep 10 '21 at 22:29

Corralien · Answer 1 · 2021-09-10T22:37:55.420

0

You can use pd.cut:

df['height_factor'] = pd.cut(df['StartingHeight'],
                             bins=[0, 6, 12, 18, 24, 30, np.inf],
                             labels=['0-6m', '6-12m', '12-18m',
                                     '18-24m', '24-30m', '>30m'],
                             right=False)

Output:

>>> df
   StartingHeight height_factor
0             4.0          0-6m
1             3.2          0-6m
2             8.0         6-12m
3            32.0          >30m
4            12.0         6-12m
5            18.3        18-24m

Fixed by @HenryEcker

edited Sep 10 '21 at 22:37

answered Sep 10 '21 at 22:30

Corralien

109,409
8
28
52

1

If you're going to use `cut` use `np.inf` not an arbitrary upper bound of `999999`. OP's function is also lowerbound inclusive so need `right=False` – Henry Ecker Sep 10 '21 at 22:31
1

Thank you for your advice. I was going to correct the `right=False` but I didn't think about `np.inf`. – Corralien Sep 10 '21 at 22:35
No problem. It's a useful trick for `cut`. – Henry Ecker Sep 10 '21 at 22:35
@HenryEcker. I'm really sorry!!! I just read the comments. I hadn't seen that you had already proposed a solution with `pd.cut`. If you want I can remove my answer. Just say it! – Corralien Sep 10 '21 at 22:41

Pandas - Assign string values based on multiple ranges

1 Answers1