R script to aggregate rows by group as concatenation of characters without R packages

Question

I want to add a new column (TAinfo-Day & TAinfo-Arm) in an existing table (SM) containing for each row the concatenation as a string of the N rows of a linked table (TA) that match define conditions: T-DAY =< S-DAY

SM & TA Keys: Study, Group

Constraint, I'm using Spotfire 7.11 which permit to use R script but NOT useful packages like library(dplyr)

I succeed to create new temporary table SMtemp that contain the desire data from TA but spread over several rows.

Next step should be to aggregate those.

TA TA

Study <K>   Group <K>   Arm T-Day
20190510    01          A1      1
20190510    01          A2      21
20190510    01          A3      35
20190510    02          B1      1
20190510    02          B2      22
20190510    02          B3      35

The Objectif is to create the columns "TAinfo-Day" & "TAinfo-Arm"

SM SM

Study <K>   SampleID    Group <K>   S-Day   TAinfo-Day  TAinfo-Arm
20190510    20190510-01.001     01  21      1,21        A1,A2
20190510    20190510-01.001     01  36      1,21,35     A1,A2,A3
20190510    20190510-01.002     01  21      1,21        A1,A2
20190510    20190510-01.002     01  36      1,21,35     A1,A2,A3
20190510    20190510-01.003     01  21      1,21        A1,A2
20190510    20190510-01.003     01  36      1,21,35     A1,A2,A3
20190510    20190510-02.001     02  21      1           B1
20190510    20190510-02.001     02  36      1,22,35     B1,B2,B3
20190510    20190510-02.002     02  21      1           B1
20190510    20190510-02.002     02  36      1,22,35     B1,B2,B3
20190510    20190510-02.003     02  21      1           B1
20190510    20190510-02.003     02  36      1,22,35     B1,B2,B3

SMtemp

Study <K>   SampleID      Group <K> S-DAY   T-DAY
20190510    20190510-01.001     01  21      1
20190510    20190510-01.001     01  21      21
20190510    20190510-01.001     01  36      1
20190510    20190510-01.001     01  36      21
20190510    20190510-01.001     01  36      35
20190510    20190510-01.002     01  21      1
20190510    20190510-01.002     01  21      21
20190510    20190510-01.002     01  36      1
20190510    20190510-01.002     01  36      21
20190510    20190510-01.002     01  36      35
20190510    20190510-01.003     01  21      1
20190510    20190510-01.003     01  21      21
20190510    20190510-01.003     01  36      1
20190510    20190510-01.003     01  36      21
20190510    20190510-01.003     01  36      35
20190510    20190510-02.001     02  21      1
20190510    20190510-02.001     02  36      1
20190510    20190510-02.001     02  36      22
20190510    20190510-02.001     02  36      35
20190510    20190510-02.002     02  21      1
20190510    20190510-02.002     02  36      1
20190510    20190510-02.002     02  36      22
20190510    20190510-02.002     02  36      35
20190510    20190510-02.003     02  21      1
20190510    20190510-02.003     02  36      1
20190510    20190510-02.003     02  36      22
20190510    20190510-02.003     02  36      35

Many thanks in advance for your help.

Possible duplicate of [Collapse / concatenate / aggregate a column to a single comma separated string within each group](https://stackoverflow.com/questions/15933958/collapse-concatenate-aggregate-a-column-to-a-single-comma-separated-string-w) — camille, May 10 '19 at 14:46
Would it be acceptable to send SMtemp as output to Spotfire and then add a pivot table of it, using UniqueConcatenate as the aggregation? This function appears to add the comma and space automatically. — blakeoft, May 10 '19 at 21:44
What do you mean by spotfire doesnt allow dplyr? You can install the package and use dplyr without any issues. Have you tried installing it? — Naveen, May 11 '19 at 02:34

Gaia Paolini · Answer 1 · 2019-05-15T07:15:15.413

0

Spotfire does allow dplyr, but there may be pockets of dplyr functionality that are not supported by TERR, especially in older versions (but you could always upgrade TERR to its latest version).

Anyway, in order to avoid using any additional package, it could be solved in this slightly verbose way :

#just in case
if('TAinfo' %in% colnames(SM)) SM$TAinfo=NULL

####summarise with concatenated string of all Tdays by Study and Group
#option1: use flexible column names (ideally col1,2,3 are input parms)
col1='Study'
col2='Group'
col3='Tday'
myformula=formula(paste(col3,'~',col1,'+',col2))
TA2=aggregate(myformula,TA,FUN=paste,collapse=',')
#option2: use literal column names
#TA2=aggregate(Tday ~ Study+Group,TA,FUN=paste,collapse=',')

#merge to main table creating Tday temporary column
SM=merge(SM,TA2,by=c('Study','Group'))

#function to cut string back to max day
cutagg = function(x,y) {
  yv=strsplit(y,',')[[1]]
  yv=yv[yv<=x]
  return (paste(yv,collapse=', '))
}

#create TAinfo column: cutting Tday up to Sday
TAinfo=mapply(cutagg,SM$Sday,SM$Tday)
#remove temporary column
SM$Tday=NULL
#export TAinfo column separately

Gaia

edited May 15 '19 at 07:15

answered May 13 '19 at 07:34

Gaia Paolini

1,044
1
5
4

Thank you for Gaia your answer, it should provide the expected result. Unfortunately, I m blocked at the aggregate function see update question. – Christophe Liber May 14 '19 at 10:29
the code is correct. You changed the aggregate statement. It works as is. – Gaia Paolini May 14 '19 at 12:14
I will add a couple of lines to make sure an existing column does not clash (probably not needed) – Gaia Paolini May 14 '19 at 12:14
Try re-adding FUN=. Your column names are also different from the ones I used, because I did not have your actual datasets. Please check which ones work and uppercase and lowercase. – Gaia Paolini May 14 '19 at 12:24
Indeed you are right your code is correct. Issue on my script come from an additional function that replace column name. When column names are set as variable, the aggregate function doesn't work as expected. I will further investigate and fully test your proposal. I will keep you informed if I succeed. – Christophe Liber May 14 '19 at 13:51
Your question did not fully reflect what you are trying to achieve. – Gaia Paolini May 15 '19 at 07:11
I have edited the answer to add an option where you name the columns from outside. But not knowing your set up I cannot answer further – Gaia Paolini May 15 '19 at 07:16
Thank you very much for your help Gaia. All your code work as expected and I succeed to generate “TAinfo-Day” column. I have an additional request (see updated question). I would like to add column “TAinfo-Arm” into SM table base on the same criteria T-DAY =< S-DAY, that give the arm injected at the relevant day defined in “TAinfo-Day”. – Christophe Liber May 15 '19 at 16:01
I think the other user's answer does well in answering both your questions. If you want to make the columns as variables you can use the formula as I suggested. – Gaia Paolini May 16 '19 at 08:04

score 0 · Answer 2 · answered May 15 '19 at 17:15

I replaced the dashes in var names with underscores...

Step 1. Merge the data

#merge data
SM_temp<-merge(SM,TA,by=c("Study","Group"))
SM_temp<-subset(SM_temp,T_Day<=S_Day)

Step 2. Aggregate

SM_new<-aggregate(cbind(Arm,T_Day)~Study+Group+SampleID+S_Day,data=SM_temp,FUN = paste0,collapse=",")

R script to aggregate rows by group as concatenation of characters without R packages

2 Answers2