0

I have a dataframe which contains code ranges that I need to break out into individual rows per codes in those ranges. Some of the codes contain letters which adds another layer of complexity.

      code_range_start code_range_end  ccs  \
0                61000          61055    1   
1                61105          61106    1   
2                61108          61130    1   
3                61150          61156    1   
4                61250          61315    1   
5                61320          61323    1   
6                61340          61340    1   
7                61345          61440    1   
8                61470          61490    1   
9                61510          61516    1   
10               61518          61530    1   
11               61534          61545    1   
12               61556          61557    1   
13               61570          61576    1   
14               61582          61596    1   
15               61598          61608    1   
16               61615          61616    1   
17               61712          61735    1   
18               61880          61880    1   
19               62161          62164    1   
20               0169T          0169T    2   
21               61107          61107    2   
22               61210          61210    2   
23               62160          62160    2   
24               62180          62258    2   
25               0202T          0202T    3   
26               0274T          0275T    3

The dataset being used is a CSV of Medical CPT codes associated to a single medical grouper code (CCS ID) and a description of that code:

https://www.hcup-us.ahrq.gov/toolssoftware/ccs_svcsproc/ccssvcproc.jsp

I've identified the rows where the ranges have or do not have letters. The ones without letters I was able to complete by referencing the following:

Add rows to dataframe each day between a range of two columns

My code looks like this:

df_noletters_full = pd.concat([pd.Series(r.ccs,
                           range(r.code_range_start, r.code_range_end + 1)) 
                           for r in df_noletters.itertuples()]) \
        .reset_index()
df_noletters_full.columns = ['code','ccs']

For the ranges containing letters, I am able to determine if the letter is at the beginning or end, remove the letter and extract it to a new 'letter' column to be appended back after the ranges are generated.

The issue I am having is that the above code doesn't like a second column when I try to do the same thing on the numeric ranges with the letter extracted as a second column. When I run:

df_letters_starts_with_full = pd.concat([pd.Series(r.ccs, r.letter,
                                                   range(r.code_range_start_digits_only, r.code_range_end_digits_only + 1))
                                                    for r in df_letters_starts_with.itertuples()]) \
    .reset_index()

I get the following error:

TypeError: Index(...) must be called with a collection of some kind, 'C' was passed

I actually had the same error occur for the numeric ranges also when I attempted to retain the 'ccs' column which contains the group id and the 'ccs description' column. I removed the description column to get it to work and joined to a separate dataframe after on 'ccs' id.

I need to retain the 'ccs' column to be joined later and the 'letter' column to be appended back to the code after. The current code just doesn't like that second column, I assume it is because it is a series, but I'm not sure if I can do something to retain it?

My only thought was to concatenate the 'ccs' ID and the 'letter' columns into one, run the code using that single column, and the parse them back out into two columns after. I'm sure that would work fine, but I figured there has to be a cleaner way to do this?

For additional reference, this website has a good tutorial working with the same file, but it is all in R:

http://healthydatascience.com/ahrq_ccs_cpt.html

Any tips on how I might alter the existing code to keep the column or is my best bet to concatenate the two into one column and then break them back out after?

Thanks!

EnTwiZle
  • 23
  • 4
  • What is your expected output from these data? – Scott Boston Oct 31 '18 at 16:45
  • Using row 26 from above should produce two rows in a new dataframe that are for 'code' 0274 and 0275. It would have a column for 'letter' which would contain the extracted 'T' string. It would also have a column for 'ccs' which would be 3. – EnTwiZle Oct 31 '18 at 19:32

0 Answers0