1

I have a data frame, my data frame is like this: except the last column is not there. I mean I do not have formula column and here my purpose is to calculate that column.

but how it has been calculated?

the formula for the last column is: for each patientNumber, number of Yes/total number of questions has been answered by the patient.

for example for the patient number one:there is 1 Yes and 2 No, so it has been 1/3

for patient two, in year 2006, month 10, we can not see Yes the three questions are no, so it has been calculated 0

 PatientNumber           QT         Answer   Answerdate      year    month  dayofyear  count  formula
1        1          transferring     No      2017-03-03      2017       3         62      2.0   (1/3)
2        1          preparing food   No      2017-03-03      2017       3         62      2.0   (1/3)
3        1          medications      Yes     2017-03-03      2017       3         62      1.0   (1/3)
4        2          transferring     No      2006-10-05      2006       10        275    3.0    0   
5        2          preparing food   No      2006-10-05      2006       10        275    3.0    0
6        2          medications      No      2006-10-05      2006       10        275    3.0    0
7        2          transferring     Yes     2007-4-15       2007       4        105    2.0    2/3
8        2          preparing food   Yes     2007-4-15       2007       4        105    2.0   2/3
9        2          medications      No      2007-4-15       2007       4        105    1.0      2/3
10       2          transferring     Yes     2007-12-15      2007       12        345    1.0      1/3
11       2          preparing food   No      2007-12-15      2007       12       345    2.0    1/3
12       2          medications      No      2007-12-15      2007       12        345    2.0    1/3
13       2          transferring     Yes     2008-10-10      2008       10        280    1.0    (1/3)
14       2          preparing food   No      2008-10-10      2008       10        280    2.0    (1/3)
15       2          medications      No      2008-10-10      2008       10        280    2.0    (1/3)
16       3          medications      No      2008-10-10      2008       12        280    ……    ………..

Update 1

Also, what if the formula change a little bit:

if the patient visit the hospital once a year, the same formula as it is multiple by 2. for example, for year 2017 there is just one month related to that patient, so it means the patient reached just one time during the year. in this case the above formula multiple by 2 works.

(why because my window should be every 6 month, so if the patient has not come every 6 month I am assuming the same record is happening)

But if there is several records during one year for one patient, it should be multiplied 2/the number of record on that year. for example at year 2007, the patient reached the hospital 2 times once in month 4 and another in month 12 so in this case the same formula should be multiplied by 2/2

sariii
  • 2,020
  • 6
  • 29
  • 57
  • @Mohamed Thasin ah Thanks but No it is ok. one patient may has record in several times during the year. I am going to process according to different year and different months – sariii Jun 13 '18 at 05:16
  • If it's like that, then for patient number 2 and year 2007 should have same value (3/6). But in your output it's different – Mohamed Thasin ah Jun 13 '18 at 05:18
  • @Mohamed Thasin ah you should take into account both year and month. for 2007 year there is two month one 4 and another 12, because of that they have to be considerred seperately. thanks for following – sariii Jun 13 '18 at 05:38
  • Now I got it Ill try to solve your problem. – Mohamed Thasin ah Jun 13 '18 at 05:39
  • 1
    I appreciate it thank you :) – sariii Jun 13 '18 at 05:39

1 Answers1

1

try this,

def func(x):
    x['yes']= len(x[x['Answer']=='Yes'])
    x['all']= len(x)
    return x
df=df.groupby(['PatientNumber','Answerdate']).apply(func)
df['formula_applied']=df['yes']/df['all']
df['formula']=(df['yes']).astype(str)+'/'+(df['all']).astype(str)
print df

Output:

    PatientNumber              QT Answer  Answerdate  year  month  dayofyear  \
0               1    transferring     No  2017-03-03  2017      3         62   
1               1  preparing food     No  2017-03-03  2017      3         62   
2               1     medications    Yes  2017-03-03  2017      3         62   
3               2    transferring     No  2006-10-05  2006     10        275   
4               2  preparing food     No  2006-10-05  2006     10        275   
5               2     medications     No  2006-10-05  2006     10        275   
6               2    transferring    Yes   2007-4-15  2007      4        105   
7               2  preparing food    Yes   2007-4-15  2007      4        105   
8               2     medications     No   2007-4-15  2007      4        105   
9               2    transferring    Yes  2007-12-15  2007     12        345   
10              2  preparing food     No  2007-12-15  2007     12        345   
11              2     medications     No  2007-12-15  2007     12        345   
12              2    transferring    Yes  2008-10-10  2008     10        280   
13              2  preparing food     No  2008-10-10  2008     10        280   
14              2     medications     No  2008-10-10  2008     10        280   

    count  yes  all  formula_applied formula  
0     2.0    1    3         0.333333     1/3  
1     2.0    1    3         0.333333     1/3  
2     1.0    1    3         0.333333     1/3  
3     3.0    0    3         0.000000     0/3  
4     3.0    0    3         0.000000     0/3  
5     3.0    0    3         0.000000     0/3  
6     2.0    2    3         0.666667     2/3  
7     2.0    2    3         0.666667     2/3  
8     1.0    2    3         0.666667     2/3  
9     1.0    1    3         0.333333     1/3  
10    2.0    1    3         0.333333     1/3  
11    2.0    1    3         0.333333     1/3  
12    1.0    1    3         0.333333     1/3  
13    2.0    1    3         0.333333     1/3  
14    2.0    1    3         0.333333     1/3 

Explanation: Try to get help from user defined method. this func will calculate you number of yes and total record. then you could solve it as your wish. column formula is your desired result. If you want it to evaluate i added formula_applied.

Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111
  • Thank you so much, it working perfectly :). does Len do the same job as Count here? – sariii Jun 13 '18 at 14:00
  • do u mind if I add one more question related to this question? – sariii Jun 13 '18 at 14:06
  • It's okay you can add. I'm on travel if time permits I'll definitely solve your problem or else I'll post the answer tomorrow – Mohamed Thasin ah Jun 13 '18 at 14:14
  • I see and thanks for your time, I just updated and add a small modification. I really appreciate if you can have a look. – sariii Jun 13 '18 at 14:22
  • @sariaGoudarzi - find unique for year column return length of unique elements in year and save this to new series and multiply with your old formula. Try Tolbert by yourself or else I'll update my answer tomorrow – Mohamed Thasin ah Jun 13 '18 at 14:48
  • Actually I can not do like this: data['counts']=data.groupby(['Clinic Number','year'])["month"].nunique().reset_index(name='counts') it raises error, if I do not mention name of count, my data frame will be changed and not all column are there – sariii Jun 13 '18 at 16:52
  • 1
    Try this, DF['count']=DF.groupby('year')['PatientNumber'].nunique() – Mohamed Thasin ah Jun 13 '18 at 22:38
  • do you have any idea of this question ?https://stackoverflow.com/questions/50912819/how-to-combine-two-bar-chart-of-two-files-in-one-diagram-in-matplotlib-pandas?noredirect=1#comment88826203_50912819 I appreciate to have a look if you have time :) – sariii Jun 18 '18 at 17:02
  • 1
    @sariaGoudarzi - Thanks for letting me know your new question. I'll definitely take a look. I'll try to solve this problem tomorrow morning and update you status :-) – Mohamed Thasin ah Jun 18 '18 at 17:19
  • 1
    appreciate it :) – sariii Jun 18 '18 at 20:02