How to get 15-rows average Date and Time column in output file from input data?

Question

I have calculated every 15rows mean of my data (.txt file) using the code given below in R.

x <- df %>% separate(`Date/Time`, into = c("Date", "Time"), sep = "T")
mn <- x %>%
 group_by(group = as.integer(gl(n(), 15, n()))) %>%
 summarise_all(funs(mean))
write.csv(min, 'C:/Users/Alexia/Desktop/Test/15row.csv')

I am getting the output successfully but in the Date and Time columns output, I am receiving NA. However, the desired output should be as follows: (Date and Time should be in one column with time written as hh:mm:ss+00 and name of column needs to be time_sp)

Time_sp   Col1   Col2   Col3....
2021-01-01 00:00:00+00  12  36  56
2021-01-01 00:15:00+00  34  54  43
2021-01-01 00:30:00+00  24  23  21
2021-01-01 00:45:00+00  12  36  56
2021-01-01 01:00:00+00  34  54  43
2021-01-01 01:15:00+00  24  23  21
2021-01-01 01:30:00+00  12  36  43
2021-01-01 01:45:00+00  12  36  34
2021-01-01 02:00:00+00  12  36  34
.
.
.

My input data (.txt) is of per minute and has Date and Time in following manner:

Date/Time   Col1   Col2   Col3....
2021-01-01T00:00:00  20  12  34...
2021-01-01T00:01:00  .....
2021-01-01T00:02:00  .....
2021-01-01T00:03:00  .....
2021-01-01T01:04:00  .....
2021-01-01T01:05:00  .....
2021-01-01T01:05:00  .....
2021-01-01T01:07:00  .....
2021-01-01T02:08:00  .....

The output of dput(df) is as follows:

structure(list(`Date/Time` = c("2021-03-01T00:01:00", "2021-03- 
01T00:02:00", "2021-03-01T00:03:00", "2021-03-01T00:04:00", "2021-03- 
01T00:05:00", "2021-03-01T00:06:00", "2021-03-01T00:07:00", "2021-03- 
01T00:08:00", "2021-03-01T00:09:00", "2021-03-01T00:10:00", "2021-03- 
01T00:11:00", "2021-03-01T00:12:00", "2021-03-01T00:13:00", "2021-03- 
01T00:14:00", "2021-03-01T00:15:00", "2021-03-01T00:16:00", "2021-03- 
01T00:17:00", "2021-03-01T00:18:00", "2021-03-01T00:19:00", "2021-03- 
01T00:20:00", "2021-03-01T00:21:00", "2021-03-01T00:22:00", "2021-03- 
01T00:23:00", "2021-03-01T00:24:00", "2021-03-01T00:25:00", "2021-03- 
01T00:26:00", "2021-03-01T00:27:00", "2021-03-01T00:28:00", "2021-03- 
01T00:29:00", "2021-03-01T00:30:00"), `XY [XY]` = c(0.990641, 0.990641, 
0.990641, 0.990641, 0.990641, 0.990641, 0.990641, 0.990641, 0.990641, 
0.990641, 0.990641, 0.990641, 0.990641, 0.990641, 0.990641, 0.990641, 
0.990641, 0.990641, 0.990641, 0.990641, 0.990641, 0.990641, 0.990641, 
0.990641, 0.990641, 0.990641, 0.990641, 0.990641, 0.990641, 0.990641), 
`C1 [CC]` = c(257L, 257L, 257L, 257L, 257L, 257L, 257L, 
257L, 257L, 257L, 257L, 257L, 257L, 257L, 257L, 257L, 257L, 
257L, 257L, 257L, 257L, 257L, 257L, 257L, 257L, 257L, 257L, 
257L, 257L, 257L), Cc = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), `C2 [C2]` = c(285L, 
285L, 285L, 285L, 285L, 285L, 285L, 285L, 285L, 285L, 285L, 
285L, 285L, 285L, 285L, 285L, 285L, 285L, 285L, 285L, 285L, 
285L, 285L, 285L, 285L, 285L, 285L, 285L, 285L, 285L), Dc = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
`C3 [C2]` = c(255L, 255L, 255L, 255L, 255L, 255L, 255L, 
255L, 255L, 255L, 255L, 255L, 255L, 255L, 255L, 255L, 255L, 
255L, 255L, 255L, 255L, 255L, 255L, 255L, 255L, 255L, 255L, 
255L, 255L, 255L), Ac = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), C4 = c(0.463735, 0.463735, 
0.463735, 0.463735, 0.463735, 0.463735, 0.463735, 0.463735, 
0.463735, 0.463735, 0.463735, 0.463735, 0.463735, 0.463735, 
0.463735, 0.463735, 0.463735, 0.463735, 0.463735, 0.463735, 
0.463735, 0.463735, 0.463735, 0.463735, 0.463735, 0.463735, 
0.463735, 0.463735, 0.463735, 0.463735), `C5 [h]` = c(1013L, 
1013L, 1013L, 1013L, 1013L, 1013L, 1013L, 1013L, 1013L, 1013L, 
1013L, 1013L, 1013L, 1013L, 1013L, 1013L, 1013L, 1013L, 1013L, 
1013L, 1013L, 1013L, 1013L, 1013L, 1013L, 1013L, 1013L, 1013L, 
1013L, 1013L), `C6 [%]` = c(43L, 43L, 43L, 43L, 43L, 43L, 
43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 
43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L
), `C7 [E2]` = c(390L, 390L, 390L, 390L, 390L, 390L, 
390L, 390L, 390L, 390L, 390L, 390L, 390L, 390L, 390L, 390L, 
390L, 390L, 390L, 390L, 390L, 390L, 390L, 390L, 390L, 390L, 
390L, 390L, 390L, 390L), Jc = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), `D [S]` = c(62.3716, 
62.3716, 62.3716, 62.3716, 62.3716, 62.3716, 62.3716, 62.3716, 
62.3716, 62.3716, 62.3716, 62.3716, 62.3716, 62.3716, 62.3716, 
62.3716, 62.3716, 62.3716, 62.3716, 62.3716, 62.3716, 62.3716, 
62.3716, 62.3716, 62.3716, 62.3716, 62.3716, 62.3716, 62.3716, 
62.3716), `Sw [S2]` = c(1392.95, 1392.95, 1392.95, 1392.95, 
1392.95, 1392.95, 1392.95, 1392.95, 1392.95, 1392.95, 1392.95, 
1392.95, 1392.95, 1392.95, 1392.95, 1392.95, 1392.95, 1392.95, 
1392.95, 1392.95, 1392.95, 1392.95, 1392.95, 1392.95, 1392.95, 
1392.95, 1392.95, 1392.95, 1392.95, 1392.95), `SW [Q2]` = 
c(389.164, 389.164, 389.164, 389.164, 389.164, 389.164, 389.164, 
389.164, 389.164, 389.164, 389.164, 389.164, 389.164, 389.164, 389.164, 
389.164, 389.164, 389.164, 389.164, 389.164, 389.164, 389.164, 
389.164, 389.164, 389.164, 389.164, 389.164, 389.164, 389.164, 
389.164), `OA [H2]` = c(646.61, 646.61, 646.61, 646.61, 
646.61, 646.61, 646.61, 646.61, 646.61, 646.61, 646.61, 646.61, 
646.61, 646.61, 646.61, 646.61, 646.61, 646.61, 646.61, 646.61, 
646.61, 646.61, 646.61, 646.61, 646.61, 646.61, 646.61, 646.61, 
646.61, 646.61), `T2 [C]` = c(3.7, 3.7, 3.7, 3.7, 3.7, 
3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 
3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 
3.7), Lc = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-30L))

Could you please edit your question to make it reproducible? [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for some tips on how to do this — jpsmith, May 10 '23 at 13:27
I suspect that you have `NA`s in your data, so the `mean` function is returning `NA`. You should use the parameter `na.rm = TRUE` within the `mean` function, and it will ignore all `NA`s — mfg3z0, May 10 '23 at 13:54
Also, `summarise_all` has been superceded. I would replace the last line with `summarise(across(everything(), ~ mean(.x, na.rm = TRUE)))` — mfg3z0, May 10 '23 at 13:59
@mfg3z0 I made a sample data with no NA values and tried with summarize_all. It is still giving me NA values in Date and Time column. (This is to clarify that I am getting average values for data columns successfully, my only concern is to get desired results as mentioned in the question in Date and Time columns) — Alexia k Boston, May 10 '23 at 14:14
@jpsmith I have added the output of dput(min) in the question — Alexia k Boston, May 10 '23 at 14:20
@AlexiakBoston we need the `dput` of the input data, this appears to be the final data. — jpsmith, May 10 '23 at 14:23
The issue may be that you haven't told `R` to interpret the `Date` and `Time` columns as dates and times. Try first doing `mutate(Date/Time = lubridate::ymd_hms(Date/Time))` — mfg3z0, May 10 '23 at 14:32
@mfg3z0 Thanks but it is giving me error: Error: unexpected '=' in "mutate(Date/Time =" How to resolve it? — Alexia k Boston, May 10 '23 at 14:46
You need backticks around `Date/Time` because of your awkward naming convention. — Limey, May 10 '23 at 14:49
Thanks @Limey . Yes, it should be `mutate(\`Date/Time\` = lubridate::ymd_hms(\`Date/Time\`))`. If you want to change the data after you've separated the columns, try `mutate(Date = lubridate::ymd(Date), Time = lubridate::hms(Time))` — mfg3z0, May 10 '23 at 16:48
@mfg3z0 I am getting error with backticks also, Error in lapply(list(...), .num_to_date) : object 'Date/Time' not found After seperating, on using the suggested code, I am getting error: Error in as.character(x) : cannot coerce type 'closure' to vector of type 'character' — Alexia k Boston, May 10 '23 at 16:51

mfg3z0 · Answer 1 · 2023-05-10T21:56:22.677

1

The issue is with your data types - you need to tell R that you are using dates and times, or it will assume you are using character vectors. If you take the mean of a character vector, it produces NA.

Try:

library(lubridate)
x <- df %>% separate(`Date/Time`, into = c("Date", "Time"), sep = "T")

min <- x %>% 
  as_tibble() %>%
  group_by(group = as.integer(gl(n(), 15, n()))) %>%
  mutate(
    # Convert Date column into the Date datatype
    Date = lubridate::ymd(Date), 
    # Convert Time column into the Period datatype (HMS). Then, 
    # change this to number of seconds
    Time = period_to_seconds(hms(Time))
  ) %>%
  summarise(across(everything(), mean)) %>% 
  # Convert Time column from number of seconds 
  # back into the Period datatype (HMS). Omit this line
  # if you'd prefer to have the average in seconds
  mutate(Time = seconds_to_period(Time))

min
#> # A tibble: 2 × 21
#>   group Date       Time     `XY [XY]` `C1 [CC]`    Cc `C2 [C2]`
#>   <int> <date>     <Period>     <dbl>     <dbl> <dbl>     <dbl>
#> 1     1 2021-03-01 8M 0S        0.991       257     0       285
#> 2     2 2021-03-01 23M 0S       0.991       257     0       285
#> # ℹ 14 more variables: Dc <dbl>, `C3 [C2]` <dbl>, Ac <dbl>,
#> #   C4 <dbl>, `C5 [h]` <dbl>, `C6 [%]` <dbl>, `C7 [E2]` <dbl>,
#> #   Jc <dbl>, `D [S]` <dbl>, `Sw [S2]` <dbl>, `SW [Q2]` <dbl>,
#> #   `OA [H2]` <dbl>, `T2 [C]` <dbl>, Lc <dbl>

write.csv(min, 'C:/Users/Alexia/Desktop/Test/15row.csv')

edited May 10 '23 at 21:56

answered May 10 '23 at 17:01

mfg3z0

561
16

Thanks, it worked. It is giving me the Dates but time column is still not as expected. Is there any way to get the Time column in the desired format? e.g. mean from 0th minute to 14th minute should have written 00:01:00-00:14:00 (first row-last row of the 15 rows) in the Time column and so on. – Alexia k Boston May 10 '23 at 17:12
@AlexiakBoston It should be fixed now! Hopefully this explains what was going on with the time thing and why it was not averaging correctly – mfg3z0 May 10 '23 at 20:26
Thanks for your effort. But it is giving me warning after summarise. Warning message: There was 1 warning in `mutate()`. ℹ In argument: `Time = period_to_seconds(hms(Time))`. Caused by warning in `.parse_hms()`: ! Some strings failed to parse, or all strings are NAs and after mutate, it given an error: Error in seconds_to_period(Time) : object 'Time' not found. Moreover, I need to use group_by function before summarize group_by(group = as.integer(gl(n(), 15, n()))) %>% Could you please tell me what is wrong with Time column and how to resolve the error. – Alexia k Boston May 10 '23 at 21:09
@AlexiakBoston The error you are describing is not reproducible with the data you have given - the above code works with the data you provided – mfg3z0 May 10 '23 at 21:59
It worked successfully without errors. My only question is though your time column has a difference of 15min but why it is starting from 8M, 23M...in your output? whereas input data is starting from 00 minutes. In my output also, it is the same case. Ideally, it should be 15M, 30M, 45...and so on. – Alexia k Boston May 11 '23 at 11:57
The code takes the mean of all columns, including `Time`. It sounds like you want the `max` from the `Time` column and the mean for every other column. If so, change the `summarise` line to say: `summarise(Time = max(Time), across(everything(), mean))` – mfg3z0 May 12 '23 at 19:41
No, it is not giving the Time output as expected, – Alexia k Boston May 13 '23 at 19:09
No. it is not giving the Time output as expected. However, I have edited the desired output, could you please help me in achieving that. – Alexia k Boston May 13 '23 at 19:20

How to get 15-rows average Date and Time column in output file from input data?

1 Answers1