Trying to convert pandas df series of floats to one of four categorical values based on there respective locations in the series quartiles

Question

I'm trying to write a function that goes through a pandas df series full of floats and converts them into one of four string categorical variables based on where they are in the series range. So all values in the ranges quartiles would be converted to either low, low_mid, high_mid, or high. I've done it a number of ways but keep getting various error messages. The latest attempt and its message is below. I'd appreciate it if some one could take a peek and toss out any ideas/fixes. Thanks!

def makeseriescategorical(x):
    for i in x:
        if i < 59863.0:
            str(i)
            i.replace(i, "low")
        elif i > 59862.0 and i < 86855.0:
            str(i)
            i.replace(i, "low_mid")
        elif i > 86854.0 and i < 125250.0:
            str(i)
            i.replace(i, "high_mid")
        elif i > 125249.0 and i < 332801:
            str(i)
            i.replace(i, "high")

The error message I got on this last attempt was: AttributeError: 'numpy.float64' object has no attribute 'replace'

I've tried various other to ways to make it a string such as astype but I keep getting errors. I'm new to coding so I'm sure theres a strong chance I'm making a dumb mistake but I'd appreciate any help anyone can give me. Cheers.

How would you categorize `59862.5`? – MaxU - stand with Ukraine Feb 06 '17 at 19:21 — MaxU - stand with Ukraine, Feb 06 '17 at 19:21

MaxU - stand with Ukraine · Accepted Answer · 2017-02-06T19:25:11.420

6

I'd use vectorized pd.cut() method:

In [51]: df = pd.DataFrame(np.random.randint(0, 332801, 10), columns=['val'])

In [52]: df
Out[52]:
      val
0  230852
1  140030
2  231657
3   73146
4  240890
5  328660
6  194801
7  240684
8   44439
9   35558

In [53]: bins = [-np.inf, 59863.0, 86855.0, 125250.0, 332801]

In [54]: labels=['low','low_mid','high_mid','high']

In [55]: df['category'] = pd.cut(df.val, bins=bins, labels=labels)

In [56]: df
Out[56]:
      val category
0  230852     high
1  140030     high
2  231657     high
3   73146  low_mid
4  240890     high
5  328660     high
6  194801     high
7  240684     high
8   44439      low
9   35558      low

In [57]: df.dtypes
Out[57]:
val            int32
category    category
dtype: object

edited Feb 06 '17 at 19:25

answered Feb 06 '17 at 19:19

MaxU - stand with Ukraine

205,989
36
386
419

Thanks MaxU! That worked great. And I've learned a bout a new method that I'm sure I'll be using quite a bit :). – user2943472 Feb 06 '17 at 19:32
Just did. Sorry, that was my first stackoverflow question. Have a good day. – user2943472 Feb 06 '17 at 19:44

Trying to convert pandas df series of floats to one of four categorical values based on there respective locations in the series quartiles

1 Answers1