0

I have a dataset with repeated measures over time, in which I am looking for predictors of the maximum tn value. I am not interested in measures which occur after this. The maximum values occur on different days for different patients.

ID  day  tn  hb  sofa  
1    1   7   85   NA  
1    2   NA  NA   NA  
1    3   35  80   13  
1    4   28  79   12  
2    1   500 NA   12  
2    2   280 80   9  
2    3   140 90   8  
2    4   20  90   7  
3    1   60  80   12  
3    2   75  75   10  
3    3   NA  75   NA  
3    4   55  84   7  

I can find tn_ max:

    tn_max <- df %>% group_by(record) %>% summarise(tn_max = max(tn,na.rm=TRUE))

How can I truncate the dataset after the maximum tn for each patient? I found this code from a previous similar question, but I can't get it to work Error: unexpected ':' in "N_max = find(df(:"

    mod_df = df; 
    N_max = find(df(:,3) == max(df(:,3)));
    N_max(1);

    for N=1:size(df,3)
    if df(N,1) < N_max
    mod_df (N,:)=0;
    end
    end
    mod_data_1(all(mod_data_1==0,1),:) = []

Many thanks, Annemarie

Annemarie
  • 123
  • 1
  • 8

2 Answers2

0

First I would create a function able to return, for any vector, a Boolean vector of the same length and whose coefficients are TRUE if the value occurs before the maximum, and FALSE otherwise:

f <- function(x) 1:length(x) <= which.max(x)

Then I would apply this function to each sub-vector of tn defined by the ID :

ind <- as.logical(ave(df$tn, df$ID, FUN=f))

Finally, all I have to do is to take the corresponding subset of the original data-frame:

df[ind, ]
Vincent Guillemot
  • 3,394
  • 14
  • 21
  • Thank you @Vincent Guillemot. I can see that this would work very elegantly. At the moment, tn structure is a number, and I get an error "Error in unique.default(x, nmax = nmax) : unique() applies only to vectors". I've tried to coerce it into a vector using as.vector, but it stays a number and I still get the error. Do you know what I could do to fix this? Many thanks, Annemarie – Annemarie May 06 '16 at 13:13
  • Sorry, I made a [classical mistake when using ave](http://stackoverflow.com/questions/16681770/r-error-in-unique-defaultx-unique-applies-only-to-vectors): I corrected it and it should work now. – Vincent Guillemot May 09 '16 at 09:00
  • Thank you @Vincent Guillemot. Can I ask one more painful question? I still can't get it to work, and I wonder if that's because I have some missing values for tn (both before and after the max value)? Is there a "na.rm" type maneouvre that would help? Thanks again, Annemarie – Annemarie May 09 '16 at 11:37
  • Can you update your example with these missing values ? I would like to run some tests on that. And yes, indeed, these missing values could cause some errors. – Vincent Guillemot May 09 '16 at 12:26
  • I've updated the example with missing values thank you – Annemarie May 09 '16 at 19:14
  • Hi Vincent, I have run it on a smaller part of my real dataset and it works perfectly, thank you! – Annemarie May 10 '16 at 09:24
0

You could try:

df %>% group_by(ID) %>% slice(seq_len(which(tn == max(tn,na.rm=TRUE))))
Source: local data frame [6 x 5]
Groups: ID [3]

     ID   day    tn    hb  sofa
  (int) (int) (int) (int) (int)
1     1     1     7    85    10
2     1     2    15    84    12
3     1     3    35    80    13
4     2     1   500    76    12
5     3     1    60    80    12
6     3     2    75    75    10
DatamineR
  • 10,428
  • 3
  • 25
  • 45