-2

There is a database of whole year:

Month Day Time  X   Y
...
3      1    0   2   4
3      1    1   4   2
3      1    2   7   3
3      1    3   8   8
3      1    4   4   6
3      1    5   1   4
3      1    6   6   6
3      1    7   7   9
...
3      2    0   5   7
3      2    1   7   2
3      2    2   9   3
...
4      1    0   2   8
...

I want to find maximum value of X for each day and create a plot for each day starting from beginning of the day (Time 0) up to this found maximum value. I tried to use dataframe but I got a bit lost and database is quite big so I'm not sure if this is the best idea.

Any ideas how to do it?

Shag
  • 1
  • 2
  • 1
    `aggregate(X ~ Month + Day, data = df, max)` will give you the maximum for each day in each month. – LAP Jun 27 '18 at 08:07
  • Welcome to SO! Please read [ask] and make your example reproducible, see [mcve]! – jogo Jun 27 '18 at 09:09
  • https://stackoverflow.com/questions/14868661/how-to-get-the-maximum-value-by-group – jogo Jun 27 '18 at 09:11

2 Answers2

0

If I understood you correctly, this should work:

Sample dataset:

set.seed(123)
df <- data.frame(Month = sample(c(1:12), 30, replace = TRUE), 
                 Day = sample(c(1:31), 30, replace = TRUE), 
                 Time = sample(c(1:24), 30, replace = TRUE),
                 x = rnorm(30, mean = 10, sd = 5),
                 y = rnorm(30, mean = 10, sd = 5))

Using tidyverse (ggplot and dplyr):

require(tidyverse)
df %>% 
  #Grouping by month and day
  group_by(Month, Day) %>% 
  #Creating new variables for x and y - the max value, and removing values bigger than the max value. 
  mutate(maxX = max(x, na.rm = TRUE), 
         maxY = max(y, na.rm = TRUE), 
         plotX = ifelse(x > maxY, NA, x), 
         plotY = ifelse(y > maxY, NA, y)) %>% 
  ungroup() %>%
  #Select and gather only the needed variables for the plot
  select(Time, plotX, plotY) %>% 
  gather(plot, value, -Time) %>%
  #Plot
  ggplot(aes(Time, value, color = plot)) + 
  geom_point()

output:

enter image description here

DJV
  • 4,743
  • 3
  • 19
  • 34
  • It looks like that my database is too big for function mutate. I tried to change NA to NA_integer and other options but with this it gives no result nor error. My idea was to get plot for every day separately and from all filtrated days. `Error in mutate_impl(.data, dots) : Column 'plotX' must be length 288 (the group size) or one, not 105423` – Shag Jun 28 '18 at 05:14
0

You can try a tidyverse. Duplicated Times per Day and Month are removed without any ranking.

library(tidyverse)
set.seed(123)
df <- data.frame(Month = sample(c(1:2), 30, replace = TRUE), 
                 Day = sample(c(1:2), 30, replace = TRUE), 
                 Time = sample(c(1:10), 30, replace = TRUE),
                 x = rnorm(30, mean = 10, sd = 5),
                 y = rnorm(30, mean = 10, sd = 5))

df %>%
  group_by(Month, Day) %>%
  filter(!duplicated(Time)) %>%  # remove dupliceted "Time"'s.  
  filter(x<=max(x) & Time <= Time[x == max(x)]) %>% 
  ggplot(aes(Time, x)) + 
   geom_line() + 
   geom_point(data=. %>% filter(x == max(x)))+ 
   facet_grid(Month~Day, labeller = label_both)

enter image description here

Or try to put all in one plot using different colors

df %>%
  group_by(Month, Day) %>%
  filter(!duplicated(Time)) %>% 
  filter(x<=max(x) & Time <= Time[x == max(x)]) %>% 
  ggplot(aes(Time, x, color = interaction(Month, Day))) + 
   geom_line() + 
   geom_point(data=. %>% filter(x == max(x)))

enter image description here

Roman
  • 17,008
  • 3
  • 36
  • 49
  • If I'm thinking correctly your plot is drawing out of Time and X instead of X and Y, but that is not the main problem. It looks like function filter has some limits and my database is too big for it? Also my idea was to generate plot separately for each day and generate it globally for every day with filtrated data. `Error in filter_impl(.data, quo) : Result must have length 288, not 105423` – Shag Jun 28 '18 at 05:17
  • What is the output of `dim(df)`? Millions would be big...Your seen error saiys that your filter rules are wrong as the output giving you `TRUE`/`FALSE`'s is not of the same length as your data. – Roman Jun 28 '18 at 12:44
  • Each day have ~250 "Times", every day starting from 0 going to ~250 contain only this numbers. Whole database is ~100 000 records – Shag Jun 29 '18 at 03:32
  • @Shag This is not big. You can skip this line ` filter(!duplicated(Time))` – Roman Jun 29 '18 at 07:31