Merge Time Series Data by user and a specific date

Question

I have a data frame like the below table. It is a time series by the user.

User	Date	Age	SentimentScore
a	9.19	20	1
a	11.20	20	2
a	12.10	20	3
b	9.30	19	1
b	10.1	19	4
c	12.1	21	5

I hope to generate a table like this one. Trail 1 means the Mean Sentiment score before a certain date(eg. Nov 7th). Trail 2 means the Mean Sentiment score after a certain date(eg. Nov 7th).

User Age trial    Mean Sentiment Score
a    20  1          1-->(mean SentimentScore before 11.7)
a    20  2          2.5 -->(mean SentimentScoree after 11.7)
b    19  1          2.5--->(mean SentimentScoree before 11.7)
c    21  1          NA --->(mean SentimentScoree before 11.7)

Can you please provide a result that is based on the example data? These look like they're based on different data and column names. — Jon Spring, Apr 28 '21 at 05:09
Please provide the example data with `dput` function. It's more precise and can avoid misunderstanding. — Peace Wang, Apr 28 '21 at 05:14
@XupinZhang I almost get the solution, but you don't provide the data with `dput` for us to copy. You should make it easier to reproduce. — Peace Wang, Apr 28 '21 at 05:30
Oh, just take it easy. Assuming your data frame calls `df`, try `dput(df)` and copy the output to your question. If you have time, you need to read https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Peace Wang, Apr 28 '21 at 05:44
@XupinZhang Hi, I noticed that in your last question's comment, Ronak Shah has noticed you that "please share data in reproducible format". So here I remind you again. In addition, you should accept one helpful answer per question. Refer - https://stackoverflow.com/help/someone-answers — Peace Wang, Apr 29 '21 at 06:27

score 1 · Answer 1 · answered Apr 28 '21 at 06:06

library(data.table)

dt[, trial := fcase(Date <= as.Date("2021-11-07"), 1,
                    Date >  as.Date("2021-11-07"), 2)]

dt[,.( Mean.Sentiment.Score = mean(SentimentScore) ),
   by = .(User,Age,trial)]

Result:

   User Age trial Mean.Sentiment.Score
1:    a  20     1                  1.0
2:    a  20     2                  2.5
3:    b  19     1                  2.5
4:    c  21     2                  5.0

data(I type it by hand, your should provide it with dput in your question):

library(data.table)
dt <- data.table(
    User = c("a", "a", "a", "b", "b", "c"),
    Date = as.Date(c("2021-09-19", "2021-11-20", "2021-12-10", "2021-09-30",
                     "2021-10-01", "2021-12-01")),
    Age = c(20, 20, 20, 19, 19, 21),
    SentimentScore = c(1, 2, 3, 1, 4, 5)
)
dt
#>    User       Date Age SentimentScore
#> 1:    a 2021-09-19  20              1
#> 2:    a 2021-11-20  20              2
#> 3:    a 2021-12-10  20              3
#> 4:    b 2021-09-30  19              1
#> 5:    b 2021-10-01  19              4
#> 6:    c 2021-12-01  21              5

^{Created on 2021-04-28 by the reprex package (v2.0.0)}

score 0 · Accepted Answer · answered Apr 28 '21 at 05:46

Is this what you are trying to do?

library(lubridate)
library(dplyr)
df %>% mutate(Date = as.Date(Date)) %>%
  group_by(User, Trial = ifelse(day(Date) > 7 & month(Date) >11, 2, 1)) %>%
  summarise(Age = mean(Age),
            SentimentScore = mean(SentimentScore), .groups = 'drop')

# A tibble: 4 x 4
  User  Trial   Age SentimentScore
  <chr> <dbl> <dbl>          <dbl>
1 a         1    20            1.5
2 a         2    20            3  
3 b         1    19            2.5
4 c         1    21            5

data used

df <- read.table(text = "User   Date    Age SentimentScore
a   2021-09-19  20  1
a   2021-11-20  20  2
a   2021-12-10  20  3
b   2021-09-30  19  1
b   2021-10-01  19  4
c   2021-12-01  21  5", header = T)

Merge Time Series Data by user and a specific date

2 Answers2