0

I'd love to simplify this section of code hopefully using a loop that iterates through the individual days of the week, finds the max value of the mean, and then adds a chosen number to that value in the place of the annotations y axis.

I can't picture exactly how it would look ideally maybe something like:

For Day in Weekday;
  (max(mean(columnname) + 10) AS new_variable

Sorry, that's probably terrible, but I've only been doing this for 2 weeks. Any help is appreciated!

Here is what I've brute forced to make it work:

## average minutes worn per day

# find max of mean range

data_df %>% 
  group_by(weekday) %>% 
  summarize(max(mean(minutesworn)))


# plot

data_df %>% 
  group_by(weekday) %>% 
  summarize(mean_wear = mean(minutesworn)) %>% 
  ggplot(mapping = aes(x = factor(weekday, level =
                                c('Sunday', 'Monday', 'Tuesday',
                                  'Wednesday', 'Thursday', 'Friday',
                                  'Saturday')), y = mean_wear, fill = weekday)) +
geom_col() +
labs(title = "Minutes Worn by Weekday",
   caption = "Data Collected in 2016") +
xlab("Weekday") + ylab("Average Minutes Worn") +
annotate("text", x = "Friday", y = 1052, label = "Friday") +
annotate("text", x = "Saturday", y = 1022, label = "Saturday") +
annotate("text", x = "Sunday", y = 977, label = "Sunday") +
annotate("text", x = "Monday", y = 1040, label = "Monday") +
annotate("text", x = "Tuesday", y = 1057, label = "Tuesday") +
annotate("text", x = "Wednesday", y = 1010, label = "Wednesday") +
annotate("text", x = "Thursday", y = 1008, label = "Thursday")

data:

data_df <- tibble::tribble(
  ~id,        ~activitydate,         ~totalsteps, ~totaldistance,   ~sedentaryminutes, ~calories, ~activeminutes, ~totalminutesasleep, ~totaltimeinbed, ~timeawakeinbed, ~month,  ~weekday,    ~minutesworn, ~alldaywear,
  1503960366, as.Date("2016-04-12"), 13162,       8.5,              728,               1985,      366,            327,                 346,             19,              "April", "Tuesday",   1094,         TRUE,
  1503960366, as.Date("2016-04-13"), 10735,       6.96999979019165, 776,               1797,      257,            384,                 407,             23,              "April", "Wednesday", 1033,         TRUE,
  1503960366, as.Date("2016-04-15"), 9762,        6.28000020980835, 726,               1745,      272,            412,                 442,             30,              "April", "Friday",    998,          TRUE,
  1503960366, as.Date("2016-04-16"), 12669,       8.15999984741211, 773,               1863,      267,            340,                 367,             27,              "April", "Saturday",  1040,         TRUE,
  1503960366, as.Date("2016-04-17"), 9705,        6.48000001907349, 539,               1728,      222,            700,                 712,             12,              "April", "Sunday",    761,          TRUE,
  1503960366, as.Date("2016-04-19"), 15506,       9.88000011444092, 775,               2035,      345,            304,                 320,             16,              "April", "Tuesday",   1120,         TRUE,
  1503960366, as.Date("2016-04-20"), 10544,       6.67999982833862, 818,               1786,      245,            360,                 377,             17,              "April", "Wednesday", 1063,         TRUE,
  1503960366, as.Date("2016-04-21"), 9819,        6.34000015258789, 838,               1775,      238,            325,                 364,             39,              "April", "Thursday",  1076,         TRUE,
)
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
  • In R for loops are specified as `for(variable in vector)` which usually looks something like `for(i in 1:10)` which would iterate 10 times with the numeric 1 - 10 as i on each iteration. If you provide some reproducible data (https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), we can help further. If it's small enough, just use `dput(data_df)` and copy and paste the result to your question – dandrews May 01 '23 at 23:49
  • 1
    It doesn't seem like you'd need a loop at all - `group_by` would let you do this for each weekday without a loop. As dandrews says, it's hard to help without a littl ebit of sample data. `dput()` is the friendliest way to share sample data, maybe `dput(data_df[1:20, ])` for the first 20 rows. – Gregor Thomas May 02 '23 at 00:07
  • 1
    `geom_text(aes(label = weekday), vjust = -0.5)`? – Jon Spring May 02 '23 at 00:33
  • I'd add that your "max mean" doesn't seem related to the question, but also doesn't make sense. The mean is a single number. `max(mean(x))` is the same as `mean(x)`. – Gregor Thomas May 02 '23 at 04:17

1 Answers1

1

Here's how I'd clean up your code:

days = c('Sunday',
         'Monday',
         'Tuesday',
         'Wednesday',
         'Thursday',
         'Friday',
         'Saturday')

plot_data = data_df %>%
  group_by(weekday) %>%
  summarize(mean_wear = mean(minutesworn)) %>%
  mutate(weekday = factor(weekday, levels = days)) 
  
ggplot(
  plot_data,
  mapping = aes(
    x = weekday,
    y = mean_wear, 
    fill = weekday
  )) +
  geom_col() +
  geom_text(aes(label = weekday), nudge_y = 10, vjust = 0) +
  labs(
    title = "Minutes Worn by Weekday",
    caption = "Data Collected in 2016",
    x = "Weekday",
    y = "Average Minutes Worn"
  ) 

enter image description here

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • 1
    Awesome! I can definitely see where it's been simplified and you're certainly right about dropping the max portion. No idea what I was thinking, but I'm learning! – Cyris Zeiders May 02 '23 at 13:18