I have monitoring data collecting information for service usage. Originally it's "who", "what", "where", "when" (and a little more, but that's basically the information about to be statistically evaluated).
I already managed to read data from database, transform lengthy "who" and "what" using tidyr::extract()
to a shorter, handy, form and "when" from timestamp to date by trunc()
.
Result is similar to what this example structure produces:
my_data <- structure(
list(
Service =
c("AService", "AService", "AService", "AService",
"AService", "BService", "BService", "BService"),
Datatype =
c("Foobar", "Foobar", "Foobar", "BarFoo",
"BarFoo", "Crumpy", "Rusty", "Rusty"),
Direction =
c("IN", "IN", "IN", "IN", "IN", "IN", "IN", "IN"),
Date =
structure(
c(18993, 18993, 18994, 18993, 18994, 18994, 18993, 18994),
class = "Date"
)
),
class = "data.frame",
row.names = c(NA, -8L)
)
Next step was to group and summarise (count):
counted <- my_data %>%
group_by(Direction, Service, Datatype, Date) %>%
arrange(.by_group = TRUE) %>%
summarize(Service, Count = n(), .groups = "keep") %>%
distinct()
This gives a data structure more or less equivalent to:
Direction | Service | Datatype | Date | Count
----------+----------+----------+------------+------
IN | AService | BarFoo | 2022-01-01 | 1
IN | AService | BarFoo | 2022-01-02 | 1
IN | AService | FooBar | 2022-01-01 | 2
IN | AService | FooBar | 2022-01-02 | 1
IN | BService | Crumpy | 2022-01-02 | 2
IN | BService | Rusty | 2022-01-01 | 1
IN | BService | Rusty | 2022-01-02 | 1
and I want:
Direction | Service | Datatype | 2022-01-01 | 2022-01-02
----------+----------+----------+------------+-----------
IN | AService | BarFoo | 1 | 1
IN | AService | FooBar | 2 | 1
IN | BService | Crumpy | 0 | 2
IN | BService | Rusty | 1 | 1
What's the idiomatic way in R? What's the elegant way? I'm clueless, what phrases to search for - all searches for "transpose" in connection to "matrix" and "partially" end up giving me advises about "panda" and "python" but not "R".
This example is, of course, short. Real data is thousands of items. And contains "OUT" as well, but filtering this is not my problem.