0

I am having some difficulty in transforming part of a column into another meaning. Basically, depending on the value that a certain cell have, that value will be replaced by another one, according to "ranges". Here it is the code:

# Save some codes
threshold_count = 250 
count_diag = Counter(df['code'])
small_codes_itens = [k for k, count in count_diag.items() if count < threshold_count]

# Only codes with less than 250 
small_diagcodes = df['code'][df['code'].isin(small_codes_itens)].str.slice(start=0, stop=3, step=1)
small_diagcodes = small_diagcodes[~small_diagcodes.str.contains("[a-zA-Z]").fillna(False)]
small_diagcodes.fillna(value='1500', inplace=True)
small_diagcodes = small_diagcodes.astype(int)

ranges = [(1, 140), (140, 240), (240, 280), (280, 290), (290, 320), (320, 390), 
               (390, 460), (460, 520), (520, 580), (580, 630), (630, 680), (680, 710),
               (710, 740), (740, 760), (760, 780), (780, 800), (800, 1000)]

# Re-code in terms of integer
for num, cat_range in enumerate(ranges):
    small_diagcodes = np.where(small_diagcodes.between(cat_range[0],cat_range[1]), num, small_diagcodes)

However, I have an error and I cannot correct it. I only know that is in the 'for' part and I also know that the problem is that when I do (isinstance(small_diagcodes, pd.Series)) before the loop is 'True' and after the loop is False. It is like converts instantly from a series to an array.

AttributeError: 'numpy.ndarray' object has no attribute 'between'

Anyone can help me, please? For example, replacing the loop for another thing?

bonaqua
  • 101
  • 7
  • Presumably you have mistaken a NumPy array for a Pandas dataframe. – mkrieger1 Apr 21 '20 at 21:25
  • Please [provide a reproducible copy of the DataFrame with `to_clipboard`](https://stackoverflow.com/questions/52413246/provide-a-reproducible-copy-of-the-dataframe-with-to-clipboard/52413247#52413247) – Trenton McKinney Apr 21 '20 at 21:26
  • The problem is that when I do (isinstance(small_diagcodes, pd.Series)) before the loop is 'True' and after the loop is False. It is like converts instantly to a 'not' series – bonaqua Apr 21 '20 at 21:30
  • You might benefit from this question https://stackoverflow.com/questions/45273731/binning-column-with-python-pandas – Trenton McKinney Apr 21 '20 at 21:31
  • It didn't help, but thanks anyway! – bonaqua Apr 22 '20 at 00:50

1 Answers1

0

Instead of the between statement try:

np.logical_and(small_diagcodes>cat_range[0], small_diagcodes<cat_range[1]))
kjul
  • 156
  • 6
  • But that it only gives me the true or false. I want the number, For example, it is between 1 and 140, the number is 0, if it is between 140 and 240, the number is 1 and so on – bonaqua Apr 21 '20 at 22:44
  • I tried and it also returns an array. I don't know why this happens. I wanna to keep it as pd.series and after the loop it returns as array – bonaqua Apr 21 '20 at 23:20