I have a dataframe which contains code ranges that I need to break out into individual rows per codes in those ranges. Some of the codes contain letters which adds another layer of complexity.
code_range_start code_range_end ccs \
0 61000 61055 1
1 61105 61106 1
2 61108 61130 1
3 61150 61156 1
4 61250 61315 1
5 61320 61323 1
6 61340 61340 1
7 61345 61440 1
8 61470 61490 1
9 61510 61516 1
10 61518 61530 1
11 61534 61545 1
12 61556 61557 1
13 61570 61576 1
14 61582 61596 1
15 61598 61608 1
16 61615 61616 1
17 61712 61735 1
18 61880 61880 1
19 62161 62164 1
20 0169T 0169T 2
21 61107 61107 2
22 61210 61210 2
23 62160 62160 2
24 62180 62258 2
25 0202T 0202T 3
26 0274T 0275T 3
The dataset being used is a CSV of Medical CPT codes associated to a single medical grouper code (CCS ID) and a description of that code:
https://www.hcup-us.ahrq.gov/toolssoftware/ccs_svcsproc/ccssvcproc.jsp
I've identified the rows where the ranges have or do not have letters. The ones without letters I was able to complete by referencing the following:
Add rows to dataframe each day between a range of two columns
My code looks like this:
df_noletters_full = pd.concat([pd.Series(r.ccs,
range(r.code_range_start, r.code_range_end + 1))
for r in df_noletters.itertuples()]) \
.reset_index()
df_noletters_full.columns = ['code','ccs']
For the ranges containing letters, I am able to determine if the letter is at the beginning or end, remove the letter and extract it to a new 'letter' column to be appended back after the ranges are generated.
The issue I am having is that the above code doesn't like a second column when I try to do the same thing on the numeric ranges with the letter extracted as a second column. When I run:
df_letters_starts_with_full = pd.concat([pd.Series(r.ccs, r.letter,
range(r.code_range_start_digits_only, r.code_range_end_digits_only + 1))
for r in df_letters_starts_with.itertuples()]) \
.reset_index()
I get the following error:
TypeError: Index(...) must be called with a collection of some kind, 'C' was passed
I actually had the same error occur for the numeric ranges also when I attempted to retain the 'ccs' column which contains the group id and the 'ccs description' column. I removed the description column to get it to work and joined to a separate dataframe after on 'ccs' id.
I need to retain the 'ccs' column to be joined later and the 'letter' column to be appended back to the code after. The current code just doesn't like that second column, I assume it is because it is a series, but I'm not sure if I can do something to retain it?
My only thought was to concatenate the 'ccs' ID and the 'letter' columns into one, run the code using that single column, and the parse them back out into two columns after. I'm sure that would work fine, but I figured there has to be a cleaner way to do this?
For additional reference, this website has a good tutorial working with the same file, but it is all in R:
http://healthydatascience.com/ahrq_ccs_cpt.html
Any tips on how I might alter the existing code to keep the column or is my best bet to concatenate the two into one column and then break them back out after?
Thanks!