1

Input

I have a dataframe as follows:

 structure(list(DistalLESfromnarescm = c("31.9", "31.9", "33.1", 
"33.3", "33.8", "34.0"), LESmidpointfromnarescm = c("31.2", "31.2", 
"32.0", "32.0", "33.1", "33.2"), ProximalLESfromnarescm = c("30.1", 
"30.1", "30.9", "30.9", "31.8", "31.9"), LESlengthcm = c("1.8", 
"1.8", "2.2", "2.5", "2.0", "2.1"), EsophageallengthLESUEScenterscm = c("12.1", 
"12.1", "14.0", "15.0", "15.1", NA), PIPfromnarescm = c("37.8", 
"37.8", "No", "No", "34.3", "35.8"), Hosp_Id = c("A", "A", "B", 
"B", "C", "D")), .Names = c("DistalLESfromnarescm", "LESmidpointfromnarescm", 
"ProximalLESfromnarescm", "LESlengthcm", "EsophageallengthLESUEScenterscm", 
"PIPfromnarescm", "Hosp_Id"), row.names = c(NA, -6L), class = "data.frame")

Aim

I would like to merge the value in any row with the preceding row if: a) The hospital number is the same and b) The value in that particular column between the grouped rows are not the same

The problem I have is how to lapply within dplyr because I don't know what to refer to in the left hand side of the lapply statement.

Attempt 1

    result2 <- Question %>% 
      group_by(HospNum_Id,DistalLESfromnarescm)%>%
      ifelse(HospNum_Id==lag(HospNum_Id),
lapply(WHAT DO I REFER TO HERE function(x) ifelse(x==lag(x), x,paste0(x,"::",lead(x)),"No")),"No")

Desired output

structure(list(DistalLESfromnarescm = c("31.9",  
   "33.1:33.3", "33.8", "34.0"), LESmidpointfromnarescm = c("31.2", 
    "32.0",  "33.1", "33.2"), ProximalLESfromnarescm = c( 
    "30.1", "30.9",  "31.8", "31.9"), LESlengthcm = c( 
     "1.8", "2.2:2.5", "2.0", "2.1"), EsophageallengthLESUEScenterscm = c( 
     "12.1", "14.0:15.0", "15.1", NA), PIPfromnarescm = c( 
     "37.8", "No",  "34.3", "35.8"), Hosp_Id = c( "A",  
     "B", "C", "D")), .Names = c("DistalLESfromnarescm", "LESmidpointfromnarescm", 
     "ProximalLESfromnarescm", "LESlengthcm", "EsophageallengthLESUEScenterscm", 
     "PIPfromnarescm", "Hosp_Id"), row.names = c(NA, -4L), class = "data.frame")
Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125
  • 1
    Thank you for providing the input. Now if you could provide example output, you'll have a fully described question! – De Novo Nov 03 '18 at 18:02
  • @De Novo as requested desired output – Sebastian Zeki Nov 03 '18 at 18:08
  • See also this [r-faq] on summarizing several (all) variables, also using `dplyr`: [Aggregate / summarize multiple variables per group (e.g. sum, mean)](https://stackoverflow.com/questions/9723208/aggregate-summarize-multiple-variables-per-group-e-g-sum-mean) – Henrik Nov 03 '18 at 18:36

1 Answers1

1

Here is a dplyr option

library(dplyr)
df1 %>% 
  group_by(Hosp_Id) %>% 
  summarise_all(.funs = function(x) paste(unique(c(dplyr::lag(x, default = NULL), x)), collapse = ":"))
# A tibble: 4 x 7
#  Hosp_Id DistalLESfromnarescm LESmidpointfromnarescm ProximalLESfromnarescm LESlengthcm EsophageallengthLESUEScenterscm PIPfromnarescm
#  <chr>   <chr>                <chr>                  <chr>                  <chr>       <chr>                           <chr>         
#1 A       31.9                 31.2                   30.1                   1.8         12.1                            37.8          
#2 B       33.1:33.3            32.0                   30.9                   2.2:2.5     14.0:15.0                       No            
#3 C       33.8                 33.1                   31.8                   2.0         15.1                            34.3          
#4 D       34.0                 33.2                   31.9                   2.1         NA                              35.8          

data

df1 <- structure(list(DistalLESfromnarescm = c("31.9", "31.9", "33.1", 
"33.3", "33.8", "34.0"), LESmidpointfromnarescm = c("31.2", "31.2", 
"32.0", "32.0", "33.1", "33.2"), ProximalLESfromnarescm = c("30.1", 
"30.1", "30.9", "30.9", "31.8", "31.9"), LESlengthcm = c("1.8", 
"1.8", "2.2", "2.5", "2.0", "2.1"), EsophageallengthLESUEScenterscm = c("12.1", 
"12.1", "14.0", "15.0", "15.1", NA), PIPfromnarescm = c("37.8", 
"37.8", "No", "No", "34.3", "35.8"), Hosp_Id = c("A", "A", "B", 
"B", "C", "D")), .Names = c("DistalLESfromnarescm", "LESmidpointfromnarescm", 
"ProximalLESfromnarescm", "LESlengthcm", "EsophageallengthLESUEScenterscm", 
"PIPfromnarescm", "Hosp_Id"), row.names = c(NA, -6L), class = "data.frame")
markus
  • 25,843
  • 5
  • 39
  • 58
  • I don't think `lag` is needed; I would just use `function(x) paste(unique(x), collapse = ":")` – Ista Nov 03 '18 at 18:25
  • ...nor `function(x)`? `d %>% group_by(Hosp_Id) %>% summarise_all(funs(paste(unique(.), collapse = ":")))` – Henrik Nov 03 '18 at 18:26
  • @Ista Included `lag` because OP wrote "I would like to merge the value in any row with the preceding row". If there are only two rows per group then your and @Henrik's approaches will be spot on. – markus Nov 03 '18 at 19:43
  • It depends on your understanding of the question, but not common-sense interpretation is that the OP meant "I would like to merge the value in any row with the preceding rows" (plural). If this is not true none of our solutions will work; if it is true my solution is somewhat more parsimonious. – Ista Nov 04 '18 at 00:36