1

I have a data frame with 99 columns for dx1-dx99 & 99 for px1-px99 and one column as mort:

dx1 dx2 dx3 .   dx99    px1 px2 .   px99    mort
E10 I12 E10 N18 R18     0FY 0TY 0DN 0DN      1
E10 I12 I31 E44 N17     0FY 0TY 0FT 5A1      0
E10 I12 N17 T86 T86     0TY 0FY 0DT          0
I12 E10 N18 A04         0TY 0FY 0DT 0T7      1
E10 I12 E10 N18 Z99     0TY 0FY              0
E10 N18 Z76             0FY 0TY 04Q 0D1      1
E10 N18 Z99 N25 E78     0TY 0FY 0WP          0

I want to keep all values in dx-dx99 & px-px99 where in matching rows the value of mort=1, otherwise set them to zero. After that count the frequencies of occurrences of remaining codes.

I tried this:

dx = df.loc[:,'dx1':'dx99']
X1pr = df.loc[:,'px1':'px99']
dx = dx.fillna(0)    
X1p = X1pr.fillna(0)
death = df.loc[:,'mort']
df1 = pd.concat([dx, X1p, death], axis=1)

N = len(df1.columns)
keep = df1.iloc[:,-(N-1):].isin(["1"]).values

df1.iloc[:,:N-1] = df1.iloc[:,:N-1].where(keep, 0)
X1d = df1.[df1.columns[0:N-1]]

mat = X1d.as_matrix(columns=None)
values, counts = np.unique(mat.astype(str), return_counts=True)
matrix = []
for v,c in zip(values, counts):
    matrix.append( [v,c])

icd9_counted_d = pd.DataFrame(matrix, columns = ['ICD_code', 'DEATHS'])

I am getting nothing in DEATHS column. Any idea?

Sanoj
  • 1,347
  • 3
  • 15
  • 21

1 Answers1

1

IIUC:

In [31]: x.loc[x.mort != 1, x.columns != 'mort'] = ''

In [32]: x
Out[32]:
   dx1  dx2  dx3  dx4 dx99  px1  px2  px3 px99  mort
0  E10  I12  E10  N18  R18  0FY  0TY  0DN  0DN     1
1                                                  0
2                                                  0
3  I12  E10  N18  A04  NaN  0TY  0FY  0DT  0T7     1
4                                                  0
5  E10  N18  Z76  NaN  NaN  0FY  0TY  04Q  0D1     1
6                                                  0
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Even if I converted mort to numeric type, It doesn't work for me. I get all empty columns, other than mort, after this operation. – Sanoj Apr 10 '17 at 21:35
  • @Sanoj, this is how i understood your question. Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and update your question accordingly. – MaxU - stand with Ukraine Apr 11 '17 at 08:36
  • I appreciate your answer. In my case 'mort' was coming as dtype 'object'. I thought that x.mort != 1 condition is failing. Therefore I converted x.mort to numeric type using convert_object function. I can see that it got converted to numeric dtype. Still condition x.mort != 1 fails and I am not getting row 0, 3, 5, with codes, as you have shown in your example above. I am getting all empty. – Sanoj Apr 11 '17 at 15:41
  • @Sanoj, if `mort` is of `object` dtype, you can simply use: `x.mort != '1'` as a condition - it's not a big deal. But the question is whether the output in my answer is your __desired__ data set or not? – MaxU - stand with Ukraine Apr 11 '17 at 15:48