0

I have monitoring data collecting information for service usage. Originally it's "who", "what", "where", "when" (and a little more, but that's basically the information about to be statistically evaluated).

I already managed to read data from database, transform lengthy "who" and "what" using tidyr::extract() to a shorter, handy, form and "when" from timestamp to date by trunc().

Result is similar to what this example structure produces:

my_data <- structure(
  list(
    Service =
      c("AService", "AService", "AService", "AService",
        "AService", "BService", "BService", "BService"),
    Datatype =
      c("Foobar", "Foobar", "Foobar", "BarFoo",
        "BarFoo", "Crumpy", "Rusty", "Rusty"),
    Direction =
      c("IN", "IN", "IN", "IN", "IN", "IN", "IN", "IN"),
    Date =
      structure(
        c(18993, 18993, 18994, 18993, 18994, 18994, 18993, 18994),
        class = "Date"
      )
  ),
  class = "data.frame",
  row.names = c(NA, -8L)
)

Next step was to group and summarise (count):

counted <- my_data %>%
  group_by(Direction, Service, Datatype, Date) %>%
  arrange(.by_group = TRUE) %>%
  summarize(Service, Count = n(), .groups = "keep") %>%
  distinct()

This gives a data structure more or less equivalent to:

Direction | Service  | Datatype | Date       | Count
----------+----------+----------+------------+------
IN        | AService | BarFoo   | 2022-01-01 | 1
IN        | AService | BarFoo   | 2022-01-02 | 1
IN        | AService | FooBar   | 2022-01-01 | 2
IN        | AService | FooBar   | 2022-01-02 | 1
IN        | BService | Crumpy   | 2022-01-02 | 2
IN        | BService | Rusty    | 2022-01-01 | 1
IN        | BService | Rusty    | 2022-01-02 | 1

and I want:

Direction | Service  | Datatype | 2022-01-01 | 2022-01-02
----------+----------+----------+------------+-----------
IN        | AService | BarFoo   | 1          | 1
IN        | AService | FooBar   | 2          | 1
IN        | BService | Crumpy   | 0          | 2
IN        | BService | Rusty    | 1          | 1

What's the idiomatic way in R? What's the elegant way? I'm clueless, what phrases to search for - all searches for "transpose" in connection to "matrix" and "partially" end up giving me advises about "panda" and "python" but not "R".

This example is, of course, short. Real data is thousands of items. And contains "OUT" as well, but filtering this is not my problem.

  • 1
    This is a reshape from long to wide; if you google those terms you'll find a lot of posts here on Stack Overflow. I've closed this question for now as a duplicate. If you have issues with implementing one of the methods to reshape from the linked post above, do leave a comment and we can take another look at it. – Maurits Evers Feb 01 '22 at 23:30
  • PS. For your data, it should be something like `library(dplyr); library(tidyr); my_data %>% pivot_wider(names_from = "Date", values_from = "Count");`. This is a `tidyverse`-based solution. The linked post will give you alternatives using other packages or in base R. – Maurits Evers Feb 01 '22 at 23:33
  • 1
    Wow - blazing fast and totally helpful. Thank you a thousand times. Pivot did the trick. :) – pitpalme Feb 01 '22 at 23:47

0 Answers0