0

I have a data frame like the below table. It is a time series by the user.

User Date Age SentimentScore
a 9.19 20 1
a 11.20 20 2
a 12.10 20 3
b 9.30 19 1
b 10.1 19 4
c 12.1 21 5

I hope to generate a table like this one. Trail 1 means the Mean Sentiment score before a certain date(eg. Nov 7th). Trail 2 means the Mean Sentiment score after a certain date(eg. Nov 7th).

User Age trial    Mean Sentiment Score
a    20  1          1-->(mean SentimentScore before 11.7)
a    20  2          2.5 -->(mean SentimentScoree after 11.7)
b    19  1          2.5--->(mean SentimentScoree before 11.7)
c    21  1          NA --->(mean SentimentScoree before 11.7)

  • 1
    Can you please provide a result that is based on the example data? These look like they're based on different data and column names. – Jon Spring Apr 28 '21 at 05:09
  • Please provide the example data with `dput` function. It's more precise and can avoid misunderstanding. – Peace Wang Apr 28 '21 at 05:14
  • @XupinZhang I almost get the solution, but you don't provide the data with `dput` for us to copy. You should make it easier to reproduce. – Peace Wang Apr 28 '21 at 05:30
  • @PeaceWang Sorry, I am a new user. – Xupin Zhang Apr 28 '21 at 05:35
  • Oh, just take it easy. Assuming your data frame calls `df`, try `dput(df)` and copy the output to your question. If you have time, you need to read https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Peace Wang Apr 28 '21 at 05:44
  • @XupinZhang Hi, I noticed that in your last question's comment, Ronak Shah has noticed you that "please share data in reproducible format". So here I remind you again. In addition, you should accept one helpful answer per question. Refer - https://stackoverflow.com/help/someone-answers – Peace Wang Apr 29 '21 at 06:27

2 Answers2

1
library(data.table)

dt[, trial := fcase(Date <= as.Date("2021-11-07"), 1,
                    Date >  as.Date("2021-11-07"), 2)]

dt[,.( Mean.Sentiment.Score = mean(SentimentScore) ),
   by = .(User,Age,trial)]

Result:

   User Age trial Mean.Sentiment.Score
1:    a  20     1                  1.0
2:    a  20     2                  2.5
3:    b  19     1                  2.5
4:    c  21     2                  5.0

data(I type it by hand, your should provide it with dput in your question):

library(data.table)
dt <- data.table(
    User = c("a", "a", "a", "b", "b", "c"),
    Date = as.Date(c("2021-09-19", "2021-11-20", "2021-12-10", "2021-09-30",
                     "2021-10-01", "2021-12-01")),
    Age = c(20, 20, 20, 19, 19, 21),
    SentimentScore = c(1, 2, 3, 1, 4, 5)
)
dt
#>    User       Date Age SentimentScore
#> 1:    a 2021-09-19  20              1
#> 2:    a 2021-11-20  20              2
#> 3:    a 2021-12-10  20              3
#> 4:    b 2021-09-30  19              1
#> 5:    b 2021-10-01  19              4
#> 6:    c 2021-12-01  21              5

Created on 2021-04-28 by the reprex package (v2.0.0)

Peace Wang
  • 2,399
  • 1
  • 8
  • 15
0

Is this what you are trying to do?

library(lubridate)
library(dplyr)
df %>% mutate(Date = as.Date(Date)) %>%
  group_by(User, Trial = ifelse(day(Date) > 7 & month(Date) >11, 2, 1)) %>%
  summarise(Age = mean(Age),
            SentimentScore = mean(SentimentScore), .groups = 'drop')

# A tibble: 4 x 4
  User  Trial   Age SentimentScore
  <chr> <dbl> <dbl>          <dbl>
1 a         1    20            1.5
2 a         2    20            3  
3 b         1    19            2.5
4 c         1    21            5 

data used

df <- read.table(text = "User   Date    Age SentimentScore
a   2021-09-19  20  1
a   2021-11-20  20  2
a   2021-12-10  20  3
b   2021-09-30  19  1
b   2021-10-01  19  4
c   2021-12-01  21  5", header = T)
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45