1

Please excuse me if this might not sound very clear, but I will do my best with this challenging issue. I have multiple dataframes.

Each data frame has hashed_user_id, server_timestap, and event. An example of three data frames are as the following:

Data Frame 1

hashed_user_id  server_timestamp    event
user1          2017-04-27 15:25:12   AS
user2          2017-04-29 19:34:19   AS
user3          2017-05-01 21:28:17   AS
user4          2017-05-03 23:01:16   AS

Data Frame 2

hashed_user_id  server_timestamp    event
user1          2017-04-27 16:25:12   AV1
user2          2017-04-29 20:34:19   AV1
user5          2017-05-01 22:19:17   AV1
user6          2017-05-03 14:01:16   AV1

Data Frame 3

hashed_user_id  server_timestamp    event
user1          2017-04-27 17:25:12   AV2
user2          2017-04-29 15:34:19   AV2
user5          2017-05-01 21:28:17   AV2
user6          2017-05-03 23:01:16   AV2

The awaited table that I wish to have should merge all users in one table, and list all events sorted out by the server_timestamp. Therefore, the expected new dataframe will look like this :

Expected result:

hashed_user_id  sorted_event1   sorted_event2   sorted_event3
user1             AS                 AV1             AV2
user2             AV2                AS              AV1
user3             AS                 NA              NA
user4             AS                 NA              NA
user5             AV2                AV1
user6             AV1                AV2

Thank you very much!

Mohammad Zahrawy
  • 305
  • 1
  • 2
  • 9

2 Answers2

2
library(tibble)
library(tidyr)

# read your data 
dt1 <- tribble(
  ~hashed_user_id,~server_timestamp, ~event,
  "user1", "2017-04-27 15:25:12", "AS",
  "user2", "2017-04-29 19:34:19", "AS",
  "user3", "2017-05-01 21:28:17", "AS",
  "user4", "2017-05-03 23:01:16", "AS"
)

dt2 <- tribble(
  ~hashed_user_id,~server_timestamp, ~event,
  "user1", "2017-04-27 16:25:12", "AV1",
  "user2", "2017-04-29 20:34:19", "AV1",
  "user5", "2017-05-01 22:28:17", "AV1",
  "user6", "2017-05-03 14:01:16", "AV1"
)

dt3 <- tribble(
  ~hashed_user_id,~server_timestamp, ~event,
  "user1", "2017-04-27 17:25:12", "AV2",
  "user2", "2017-04-29 15:34:19", "AV2",
  "user5", "2017-05-01 21:28:17", "AV2",
  "user6", "2017-05-03 23:01:16", "AV2"
)

# solution
dt <- rbind(dt1, dt2, dt3) %>% 
  mutate(server_timestamp = as.POSIXct(server_timestamp, format = "%Y-%m-%d %H:%M:%S")) %>%
  group_by(hashed_user_id) %>%
  arrange(server_timestamp) %>%
  mutate(sorted_event_id = paste0("sorted_event", 1:n())) %>%
  select(-server_timestamp) %>%
  spread(sorted_event_id, event) %>%
  ungroup()
67342343
  • 816
  • 5
  • 11
  • Amazing Amazing! Thank you very much! I was working on the reduce which gave some good results as well, but this one gives an exact solution to my problem! – Mohammad Zahrawy Sep 07 '17 at 13:19
0

Not exactly a solution in the sense it doesn't give your expected output, but it's better to avoid having sorted data in different columns with NAs like this.

If you must still work in R with this later you'll have some dirty work to do again.

Consider having your sorted events in a vector, that you'll store in your data.frame/ tibble.

And start by putting these data.frames in a list! :)

res <- list(df1,df2,df3) %>%
  bind_rows %>%
  arrange(server_timestamp) %>%
  select(-server_timestamp) %>%
  nest(event,.key="sorted_events")
# A tibble: 6 x 2
#    hashed_user_id     sorted_events
# <chr>           <list>
#   1          user1 <tibble [3 x 1]>
#   2          user2 <tibble [3 x 1]>
#   3          user3 <tibble [1 x 1]>
#   4          user5 <tibble [2 x 1]>
#   5          user6 <tibble [2 x 1]>
#   6          user4 <tibble [1 x 1]>
res$sorted_events[[4]]
# # A tibble: 2 x 1
#    event
#    <chr>
#  1   AV2
#  2   AV1
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167