-1

I have data frame looking like this:

> head(temp)
      VisitIDCode start stop Value_EVS hr heart rate NU EE0A Value_EVS temp celsius CAL 113C Value_EVS current weight kg CAL
23642  2008253059   695  696                            <NA>                            36.4                            <NA>
24339  2008253059   695  696                             132                            <NA>                            <NA>
72450  2008953178   527  528                            <NA>                            38.6                            <NA>
72957  2008953178   527  528                             123                            <NA>                            <NA>
73976  2008965669   527  528                            <NA>                            36.2                            <NA>
74504  2008965669   527  528                             116                            <NA>                            <NA>

First and second row are both for the same patient(same VisitIDCode), in the first row I have the value of heart rate and in the second I have the value of temperature from time 2 to 3. I want to combine these rows so that the result is one row that looks like:

      VisitIDCode start stop Value_EVS hr heart rate NU EE0A Value_EVS temp celsius CAL 113C Value_EVS current weight kg CAL
23642  2008253059   695  696                             132                            36.4                            <NA>

In other words, I want my data frame to be unique by combination of VisitIDCode, start and stop. This is a large dataframe with more columns that need to be combined. What is the best way of doing it and if at all possible, avoiding for loop? Edit: I don't want to remove the NAs. If there are 2 rows each of which have one value and 2 NAs, I want to combine them to one row so it has two values and one NA. Like the example above.

nasim
  • 725
  • 2
  • 8
  • 17
  • 1
    Possible duplicate of [Merge two rows in one dataframe, when the rows are disjoint and contain nulls](http://stackoverflow.com/questions/14268814/merge-two-rows-in-one-dataframe-when-the-rows-are-disjoint-and-contain-nulls) – Sotos May 19 '17 at 20:46
  • I don't want to remove NAs though. @Sotos – nasim May 19 '17 at 20:50

2 Answers2

1

nasim,

It's useful to create a reproducible example when posting questions. It makes it much easier to sort out how to help. I created a toy example here. Hopefully, that reproduces your issue:

> df <- data.frame(MRN = c(123,125,213,214), 
+                  VID = c(2008,2008,2011,2011), 
+                  start=c(695,695), 
+                  heart.rate = c(NA,112,NA,96),
+                  temp = c(39.6,NA,37.4,NA))
> df
  MRN  VID start heart.rate temp
1 123 2008   695         NA 39.6
2 125 2008   695        112   NA
3 213 2011   695         NA 37.4
4 214 2011   695         96   NA

Here is a solution using dplyr:

> library(dplyr)
> df <- df %>% 
+   group_by(VID) %>%
+   summarise(MRN = max(MRN,na.rm=T),
+             start=max(start,na.rm=T),
+             heart.rate=max(heart.rate,na.rm=T),
+             temp = max(temp,na.rm=T))
> df
# A tibble: 2 × 5
    VID   MRN start heart.rate  temp
  <dbl> <dbl> <dbl>      <dbl> <dbl>
1  2008   125   695        112  39.6
2  2011   214   695         96  37.4
Eric
  • 1,381
  • 9
  • 24
  • BTW, this will preserve NA when it's in all rows grouped by a certain VID – Eric May 19 '17 at 21:00
  • Thank you very much. What about the columns that have values that are not numbers? Max() doesn't work for those. – nasim May 19 '17 at 21:10
  • I don't see any non-numerical value in the data frame, but when I run the code you suggested, I get this error: Error in summarise_impl(.data, dots) : ‘max’ not meaningful for factors – nasim May 19 '17 at 21:12
  • max works for character vectors `max("A","B") = "B"`. I have had issues with dplyr errors due to plyr conflict. try dplyr::group_by and dplyr::summarise. if that fails try adding `stringsAsFactors = F` to the data.frame statment. – Eric May 19 '17 at 21:27
0

After I made sure all columns classes are numeric (not factors) by defining the classes of columns while reading the data in, this worked for me:

CompleteCoxObs<-aggregate(x=CompleteCoxObs[c("stop","Value_EVS current weight kg CAL","Value_EVS hr heart rate NU EE0A","Value_EVS temp celsius CAL 113C")], by=list(VisitIDCode=CompleteCoxObs$VisitIDCode,start=CompleteCoxObs$start), max, na.rm = FALSE);
nasim
  • 725
  • 2
  • 8
  • 17