0

I have a df with date_entered and person_id. I first cut the month from the date_entered using

df$month <- as.Date(cut(df$date_entered, breaks = "month"))

then created a df of frequency by person_id using

 occurences<-df %>%
  count(month, person_id)

where month is month, person_id, and n is the count per month for that person_id

| month      | person_id | n  |
| ---------- | ----------|----|
| 2021-01-01 | 12345652  | 2  |
| 2021-01-01 | 56412342  | 6  |
| 2021-01-01 | 45621311  | 11 |
| 2021-01-01 | 45213652  | 8  |
| 2021-01-01 | 69534000  | 1  |
| 2021-01-01 | 60221351  | 4  |
| 2021-02-01 | 12345652  | 8  |
| 2021-02-01 | 12342546  | 6  |
| 2021-02-01 | 52013000  | 3  |
| 2021-02-01 | 33251000  | 1  |
| 2021-02-01 | 55210000  | 6  |
| 2021-02-01 | 10012310  | 4  |
| 2021-03-01 | 00012342  | 2  |

I played around with various codes, including

count_n <- occurences$n

a_number <- occurences$person_id

occurences_df <- data.frame(occurences$month, occurences$person_id, count_n)

ggplot(occurences[tail(order(occurences$count_n),20),],) + 
  aes(x=reorder(person_id, -count_n), count_n) + 
  geom_bar(stat = "identity") + 
  labs(x="top 20", y ="number of days in QA") + 
  theme(axis.text.x = element_blank()) 

so far with the ggplot above, (using my original dataset) I am able to create the plot below but without the grouping by month: a bar graph in descending order

each bar above refers to a unique person_id and the height is the number of times it occurred. However, I would like to show the top 5 per month based on the date_entered variable or the month variable created from the occurrences table.

I would like to see something like this: enter image description here instead of the week number on the x-axis, it refers to the top 5 person_id per month

Phil
  • 7,287
  • 3
  • 36
  • 66
doubleD
  • 269
  • 1
  • 12
  • 1
    It would help a lot to have actual data we can work with. can you make this questions reproducible? Please see https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. Thanks! – r2evans Jul 03 '21 at 18:21
  • 1
    Thank you for reminding me about the data as an image, I updated the format and I hope that helps. – doubleD Jul 03 '21 at 18:53
  • Thanks for the data. Can you clearify your question based on that data? How would your expected output looks like? I know, you want to draw that output, but actually I don't understand what you are trying to do. Perhaps you can describe it. – Martin Gal Jul 03 '21 at 20:11
  • Thanks, edited question with hopefully a clearer request. – doubleD Jul 03 '21 at 20:23

1 Answers1

0

Perhaps this is what you are looking for:

library(ggplot)
library(dplyr)

df  %>% 
  group_by(month) %>% 
  slice_max(n, n=5) %>% 
  ggplot(aes(x=person_id, y=n)) + 
  geom_bar(stat = "identity") + 
  facet_wrap(~month,nrow=1, scales="free") + 
  theme(axis.text.x=element_text(angle = -45, hjust = 0))

which returns the top 5 person_id per day (different values in your month column). Since there are 11 rows in your example data, the plot looks a little bit strange...

plot_of_example_data

The top 5 is choosen by slice_max(n, n=5).

Data

structure(list(month = structure(c(18628, 18628, 18628, 18628, 
18628, 18628, 18659, 18659, 18659, 18659, 18659, 18659, 18687
), class = "Date"), person_id = c("12345652", "56412342", "45621311", 
"45213652", "69534000", "60221351", "12345652", "12342546", "52013000", 
"33251000", "55210000", "10012310", "00012342"), n = c(2, 6, 
11, 8, 1, 4, 8, 6, 3, 1, 6, 4, 2)), problems = structure(list(
    row = 1:12, col = c(NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_), expected = c("3 columns", "3 columns", "3 columns", 
    "3 columns", "3 columns", "3 columns", "3 columns", "3 columns", 
    "3 columns", "3 columns", "3 columns", "3 columns"), actual = c("4 columns", 
    "4 columns", "4 columns", "4 columns", "4 columns", "4 columns", 
    "4 columns", "4 columns", "4 columns", "4 columns", "4 columns", 
    "4 columns"), file = c("literal data", "literal data", "literal data", 
    "literal data", "literal data", "literal data", "literal data", 
    "literal data", "literal data", "literal data", "literal data", 
    "literal data")), row.names = c(NA, -12L), class = c("tbl_df", 
"tbl", "data.frame")), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -13L), spec = structure(list(
    cols = list(month = structure(list(format = ""), class = c("collector_date", 
    "collector")), person_id = structure(list(), class = c("collector_character", 
    "collector")), n = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))
Martin Gal
  • 16,640
  • 5
  • 21
  • 39
  • Thanks! I added something in the line: ggplot(aes(x=reorder(a_number, -n), y=n)) to show the graph in descending order and relabeled the x-axis – doubleD Jul 03 '21 at 22:20