R and dplyr, using group_by to run code per group not working

Question

First of all I'm quite new to R so I may be off the mark in my understanding of what is happening here, but I'm stuck on this piece of code and I need it fixed quickly so thank you for your time and effort preemptively.

I'm trying to find a freezing point per route per year, essentially this will happen when the CT value passes the threshold of 9. The thing is since I'm working with Arctic data, the CT value will start off being above 9 and I have to find where it first passes the threshold from being below to above 9. Maybe there are functions for this sort of local min but I don't know what they are.

I tried making a long pipe statement but I was having some trouble in referencing columns so I attempted to group_by outside of the pipe statement but that didn't work either.

EDIT: Here is a sample. I would like to end up with 1 value (Day of Year) for East 1983 and East 1984. The correct returned values are 6 and 18 respectively.

Route Year  Day_Year    CT
East  1983  1           3
East  1983  2           2
East  1983  3           1
East  1983  4           0
East  1983  5           2
East  1983  6           9.5
East  1984  1           3   
East  1984  3           2
East  1984  9           1
East  1984  10          0
East  1984  14          2
East  1984  18          9.5


library("dplyr")
data_g <- group_by(Sea_Ice, Route, Year)

#Above 9 Freeze-Up
Above_9_A <- 
  #group_by(Sea_Ice, Route, Year) %>%
  data_g %>%
  mutate(row.position = which.min(data_g$CT))%>%
  filter(CT > 9, !SA %in% c("New Ice", "Nilas", "Grey Ice", "Open Water")) %>%
  slice(which.min(Day_Year)) %>%
  mutate(Conc_Threshold = "Above_9")

What I'm currently doing is resulting in finding the minimum for ALL routes over ALL years.

I just have no idea where to go from here, thank you for your help.

EDIT 2: I've removed the filters for the other columns for now, as it isn't part of my issue

When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Mar 06 '19 at 02:51
Where do the columns CA, CB, and CC come from? And I'm guessing you want a column called "Day_Year" like in your code, not "Day of Year"? — camille, Mar 06 '19 at 03:19
Oops, I've corrected those for now. I've done piping to filters so that isn't the issue for now. I'm mostly concerned with my usage of group_by. — Emmelie Paquette, Mar 06 '19 at 03:24
your sample doesn't include the case you describe, where a year starts above 9, then dips below and goes up again - which is what you want to find. — iod, Mar 06 '19 at 03:38
The code you've written at the bottom is denoting columns that aren't present in your sample. Do you need every `day_year` and `year` when it passed `9` or the first instance where it passed `9`? If the data set isn't that large, just run `data_g %>% filter(CT>9)` (or assign it to a new object) and you can scan the results for the first instance each year — dre, Mar 06 '19 at 03:52

iod · Accepted Answer · 2019-03-06T03:41:39.863

1

What you need to do is create a column that will be TRUE when there has both been a previous number below 9 AND the current number is above 9. This is how you can do this:

data_g %>% group_by(route, year) %>% 
  mutate(freezepoint=(cumsum(CT<9)>0 & CT>=9)) %>% 
  filter(freezepoint)

Or, more directly:

data_g %>% group_by(route,year) %>% slice(which.max(cumsum(CT<9)>0 & CT>=9))

(note: this assumes that the data.frame is arranged by day already)

edited Mar 06 '19 at 03:41

answered Mar 06 '19 at 03:32

iod

7,412
2
17
36

Thank you this helped alot! I am new to R so that was a new function for me! I modified slightly for my case but it worked!! Thanks again! – Emmelie Paquette Mar 06 '19 at 11:22
1

Glad I could help. Don't forget to accept and upvote. – iod Mar 06 '19 at 12:38

R and dplyr, using group_by to run code per group not working

1 Answers1