4

I'm trying to write a function that goes through a pandas df series full of floats and converts them into one of four string categorical variables based on where they are in the series range. So all values in the ranges quartiles would be converted to either low, low_mid, high_mid, or high. I've done it a number of ways but keep getting various error messages. The latest attempt and its message is below. I'd appreciate it if some one could take a peek and toss out any ideas/fixes. Thanks!

def makeseriescategorical(x):
    for i in x:
        if i < 59863.0:
            str(i)
            i.replace(i, "low")
        elif i > 59862.0 and i < 86855.0:
            str(i)
            i.replace(i, "low_mid")
        elif i > 86854.0 and i < 125250.0:
            str(i)
            i.replace(i, "high_mid")
        elif i > 125249.0 and i < 332801:
            str(i)
            i.replace(i, "high")

The error message I got on this last attempt was: AttributeError: 'numpy.float64' object has no attribute 'replace'

I've tried various other to ways to make it a string such as astype but I keep getting errors. I'm new to coding so I'm sure theres a strong chance I'm making a dumb mistake but I'd appreciate any help anyone can give me. Cheers.

1 Answers1

6

I'd use vectorized pd.cut() method:

In [51]: df = pd.DataFrame(np.random.randint(0, 332801, 10), columns=['val'])

In [52]: df
Out[52]:
      val
0  230852
1  140030
2  231657
3   73146
4  240890
5  328660
6  194801
7  240684
8   44439
9   35558

In [53]: bins = [-np.inf, 59863.0, 86855.0, 125250.0, 332801]

In [54]: labels=['low','low_mid','high_mid','high']

In [55]: df['category'] = pd.cut(df.val, bins=bins, labels=labels)

In [56]: df
Out[56]:
      val category
0  230852     high
1  140030     high
2  231657     high
3   73146  low_mid
4  240890     high
5  328660     high
6  194801     high
7  240684     high
8   44439      low
9   35558      low

In [57]: df.dtypes
Out[57]:
val            int32
category    category
dtype: object
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419