I have a df that of 32 columns and just under a million rows. The columns are the POINTID (individual id), First (year that an event first happened), and then 30 columns of years w binary occurrence data. I would like the first occurrence in each row (currently stored as a 1, same as all other occurrences) to be changed to a 2, so that I can differentiate between the first event and repeat events. I've tried doing this with the tidyverse, but even then it is taking forever. I can't tell if my code is just wrong or if it's not computationally efficient enough. I tested it on a smaller dataset and it seemed to work, in the long format but not the wide, so I'm thinking it's an efficiency issue because the pivot_longer table generated is about about 35 million rows long.
Can anyone help me understand why this isn't working or how to do it in a way that computes faster?
classifications %>%
pivot_longer(-c(1,32),names_to="Years", values_to="Present")%>%
group_by(POINTID)%>%
mutate(Present=replace(Present, Years==first, 2))
A reduced version of my DF is below:
> dput(classifications)
structure(list(POINTID = 2:11, first = structure(c(33L, 33L,
33L, 33L, 1L, 33L, 33L, 1L, 1L, 36L), .Label = c("X1985", "X1986",
"X1987", "X1988", "X1989", "X1990", "X1991", "X1992", "X1993",
"X1994", "X1995", "X1996", "X1997", "X1998", "X1999", "X2000",
"X2001", "X2002", "X2003", "X2004", "X2005", "X2006", "X2007",
"X2008", "X2009", "X2010", "X2011", "X2012", "X2013", "X2014",
"X2015", "X2016", "X2017", "X2018", "X2019", "X2020"), class = "factor"),
X1990 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X1991 = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0), X1992 = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 0), X1993 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X1994 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X1995 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X1996 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X1997 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X1998 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X1999 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X2000 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X2001 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X2002 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X2003 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X2004 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X2005 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X2006 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X2007 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X2008 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X2009 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X2010 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X2011 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X2012 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X2013 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X2014 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X2015 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X2016 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X2017 = c(1, 1, 1, 1, 0, 1, 1, 0, 0, 0), X2018 = c(1,
0, 0, 0, 0, 0, 0, 0, 0, 0), X2019 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X2020 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1)), row.names = c(NA,
10L), class = "data.frame")