0

So I have a rather large file that is broken down like this:

Claim CPT Code TOTAL_ALLOWED CPT_CODE NEW_PRICE ALLOWED_DIFFERENCE
6675647 90887 120 90887 153 difference

The thing is, for my data set, the existing already paid data is 47K lines long, yet the CPT codes we are paying are 20 codes only. How would use Pandas/Numpy to have python look at the CPT code, find its match, and compare the TOTAL_ALLOWED with the NEW_PRICE to determine what is ultimately owed.

I think I have it with this, but I'm having an issue with having Python iterate through my list:

    df['price_difference'] = np.where(df['LINE_TOTAL_ALLOWED'] == ((df['NEW_PRICE'])*15)), 0,  df['LINE_TOTAL_ALLOWED'] - ((df['NEW_PRICE']*15))```

but so far, its giving me an error that the rows don't match. 

Any help is appreciated! 
Hadar
  • 658
  • 4
  • 17
  • Hi, welcome to StackOverflow. Please take the [tour](https://stackoverflow.com/tour) and learn [How to Ask](https://stackoverflow.com/help/how-to-ask). In order to get help, you will need to provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). If your question include a pandas dataframe, please provide a [reproducible pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). As it is right now, your code is not reproducible. It is not even valid python. – alec_djinn Oct 14 '22 at 08:08

2 Answers2

0

There is a small formatting error. Try this:

df['price_difference'] = np.where(df['LINE_TOTAL_ALLOWED'] == ((df['NEW_PRICE']*15)), 0,  df['LINE_TOTAL_ALLOWED'] - ((df['NEW_PRICE']*15)))

Bushmaster
  • 4,196
  • 3
  • 8
  • 28
  • Thanks for catching that, however, it is still telling me that the rows for the new CPT code and prices are not matching with the other 47K. Any ideas how to iterate through the 47K lines, have Pandas look at the CPT codes, if it finds a match, verify it against the new price? – C GVerg Oct 14 '22 at 16:23
  • Okey. There is a field called **df['LINE_TOTAL_ALLOWED']** where did this field come from? Is the data you are working on different from the data you gave as an example? if you provide a sample data we can solve the problem much faster. – Bushmaster Oct 14 '22 at 16:59
  • Here is the data set that is pertinent. https://ibb.co/cyCzVgy What I need to do is for every line that has a matching CPT code, I need to subtract the new price from the LINE_TOTAL_ALLOWED to determine what is still owed. I think my script isn't working because there are 47K lines to match against only 20 NEW_PRICE data elements. – C GVerg Oct 14 '22 at 17:21
  • if i understood correctly, the two columns on the right (proc_code_new, new_price) I would save it as a different excel. Then I would take the remaining data as df1, the data we saved as a different excel as df2 and use: `final=df1.merge(df2,how='left' lef_on='PROC_CODE', right_on='PROC_CODE_NEW')` . Finally, I would use the np.where function you used in the question. – Bushmaster Oct 14 '22 at 21:47
0

I did what Clegane mentioned:

final = df1.merge(df3,  how='left' , left_on = 'CLAIM_ID' , right_on= 'QUANTITY') 
df2 = df1.drop_duplicates(keep = 'first')

Then I dropped the duplicates. I first did this on only 20 lines of excel, then after I made sure it worked, I let it loose on my 945000 line .xlsx. Everything worked, and everything lined up. It was daunting...