0

I have a couple of options and approaches but i am not sure exactly which is best or how to actually fully code it.

I have some ocean data at different locations each sampling is described as an event - which is how i differentiate each sampling thus i would like to group_by() event. However my sampling is too fine and i would like to get the average value in this case turbidity for every 0.5 m of depth. So perhaps rounding depth to the nearest 0.5 and then averaging the rest of the variables: "time", "pres" - with the "station" and "event" remaining an ID factor.

So I am thinking something like:

 df2 <- df %>% group_by(event)%>%
 mutate(vars(depth),funs(round(.,5))%>%
 mutate_if(is.numeric, mean)

^ but that is not correct

Another option is reducing the data to a per second and averaging all the numeric values, including depth, per second - but again I am not sure how best to do that.

Expected output:

enter image description here

Heres is some dummy data:

df <- structure(list(datetime = structure(c(1556215607, 1556215607, 
1556215607, 1556215607, 1556215607, 1556215607, 1556215607, 1556215608, 
1556215608, 1556215608, 1556215608, 1556215608, 1556215608, 1556215609, 
1556215609, 1556215609, 1556215609, 1556215609, 1556215609, 1556215610, 
1556215610, 1556215610, 1556215610, 1556215610, 1556215610, 1556215611, 
1556215611, 1556215611, 1556215611, 1556215611, 1556215611, 1556215612, 
1556215612, 1556215612, 1556215612, 1556215612, 1556215612, 1556215613, 
1556215613, 1556215613, 1556215613, 1556215613, 1556215613, 1556215614, 
1556215614, 1556215614, 1556215614, 1556215614, 1556215614, 1556215615, 
1556216764, 1556216765, 1556216765, 1556216765, 1556216765, 1556216765, 
1556216765, 1556216766, 1556216766, 1556216766, 1556216766, 1556216766, 
1556216766, 1556216767, 1556216767, 1556216767, 1556216767, 1556216767, 
1556216767, 1556216768, 1556216768, 1556216768, 1556216768, 1556216768, 
1556216768, 1556216769, 1556216769, 1556216769, 1556216769, 1556216769, 
1556216769, 1556216770, 1556216770, 1556216770, 1556216770, 1556216770, 
1556216770, 1556216771, 1556216771, 1556216771, 1556216771, 1556216771, 
1556216771, 1556216772, 1556216772, 1556216772, 1556216772, 1556216772, 
1556216772, 1556216772, 1556216773), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), depth = c(0.48, 2.34, 2.36, 2.35, 2.35, 2.35, 
2.37, 2.35, 2.34, 2.35, 2.34, 2.34, 2.35, 2.35, 2.35, 2.35, 2.35, 
2.35, 2.34, 2.34, 2.32, 2.32, 2.3, 2.3, 2.31, 2.32, 2.32, 2.32, 
2.35, 2.34, 2.34, 2.35, 2.33, 2.34, 2.33, 2.32, 2.31, 2.31, 2.31, 
2.33, 2.34, 2.35, 2.35, 2.36, 2.36, 2.36, 2.36, 2.36, 2.35, 2.35, 
1.76, 1.76, 1.76, 1.76, 1.77, 1.76, 1.76, 1.77, 1.76, 1.76, 1.77, 
1.79, 1.78, 1.78, 1.8, 1.78, 1.76, 1.77, 1.76, 1.78, 1.83, 1.97, 
2.11, 2.31, 2.48, 2.62, 2.77, 2.92, 3.06, 3.19, 3.35, 3.49, 3.66, 
3.8, 3.94, 4.09, 4.24, 4.38, 4.54, 4.68, 4.82, 4.95, 5.1, 5.23, 
5.38, 5.5, 5.65, 5.79, 5.95, 6.08, 6.27), press = c(0.48, 2.36, 
2.38, 2.37, 2.37, 2.37, 2.39, 2.37, 2.36, 2.37, 2.36, 2.36, 2.37, 
2.37, 2.37, 2.37, 2.37, 2.37, 2.36, 2.36, 2.34, 2.34, 2.32, 2.32, 
2.33, 2.34, 2.34, 2.34, 2.37, 2.36, 2.36, 2.37, 2.35, 2.36, 2.35, 
2.34, 2.33, 2.33, 2.33, 2.35, 2.36, 2.37, 2.37, 2.38, 2.38, 2.38, 
2.38, 2.38, 2.37, 2.37, 1.78, 1.78, 1.78, 1.78, 1.79, 1.78, 1.78, 
1.79, 1.78, 1.77, 1.79, 1.81, 1.8, 1.8, 1.82, 1.8, 1.78, 1.79, 
1.78, 1.8, 1.85, 1.99, 2.13, 2.33, 2.5, 2.64, 2.79, 2.94, 3.09, 
3.22, 3.38, 3.52, 3.69, 3.83, 3.97, 4.12, 4.28, 4.42, 4.58, 4.72, 
4.86, 4.99, 5.14, 5.27, 5.43, 5.55, 5.7, 5.84, 6, 6.13, 6.32), 
event = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2), station = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("BI-1", "BI-2", 
"BI-3", "BI-4", "BI-5", "BI-6", "BI-8", "BI-9"), class = "factor")), class = "data.frame", 
row.names = c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 
42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 1600L, 1601L, 1602L, 
1603L, 1604L, 1605L, 1606L, 1607L, 1608L, 1609L, 1610L, 1611L, 
1612L, 1613L, 1614L, 1615L, 1616L, 1617L, 1618L, 1619L, 1620L, 
1621L, 1622L, 1623L, 1624L, 1625L, 1626L, 1627L, 1628L, 1629L, 
1630L, 1631L, 1632L, 1633L, 1634L, 1635L, 1636L, 1637L, 1638L, 
1639L, 1640L, 1641L, 1642L, 1643L, 1644L, 1645L, 1646L, 1647L, 
1648L, 1649L, 1650L))

Any help appreciated.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Lmm
  • 403
  • 1
  • 6
  • 24
  • Can you show the expected output – akrun Sep 17 '19 at 20:27
  • @akrun does that help? – Lmm Sep 17 '19 at 20:44
  • Is this typo `mutate(vars(depth),funs(round(.,5))` I meant, is it `mutate_at` also, the `funs` is getting deprecated to be replaced by `list(` If there is only a single column, `mutate(depth = round(depth, 5))` – akrun Sep 17 '19 at 20:47
  • (a) in your result you have one row per event, rounded depth, and station. So you need to `group_by` all 3 of those columns. Then you need to decide what you want to do with the remaining columns, `time` and `press` - you could average them, take the lowest, take the highest, median, etc. In the end, you'll get something like `df %>% mutate(depth = round(depth, 5)) %>% group_by(depth, event, station) %>% summarize(press = mean(press), time = first(time))`. Or maybe you want `time` in the grouping too - only you can say. – Gregor Thomas Sep 17 '19 at 20:51
  • @akrun in my actual data there will be more numeric variables to average so i think mutate_at is more appropriate? – Lmm Sep 17 '19 at 20:53
  • @Gregor i will give that ago and see if i can work it - it usually my syntax which scuppers me so seeing how other approach it helps - thanks – Lmm Sep 17 '19 at 20:53
  • (b) the base `round` function won't easily let you round to the nearest 0.5. [This question has a few workaround](https://stackoverflow.com/q/8664976/903061), the easiest being `round(depth / 0.5) * 0.5`. That said, you might want to use [cut](https://stackoverflow.com/a/5570360/903061) and make some nice labels. – Gregor Thomas Sep 17 '19 at 20:54
  • 1
    (c) If you have a lot of non-grouping numeric variables, then the last line might be `summarize_all(mean)` instead of `summarize(press = mean(press), time = first(time), ...)`. You only need an `_at` or an `_all` function when modifying multiple variables. When you round depth, you are only rounding depth, so a regular `mutate` makes sense. – Gregor Thomas Sep 17 '19 at 20:56

0 Answers0