In my dataset I have information of the ZIPCODE of 600K+ ID's. If ID's move to a different addressess, I want to determine at which zipcode they lived the longest and put a '1' for that specific year in that row (no need to combine rows as I want to know if they where they lived in what year). That way an ID only have a '1' for a certain year at one row (if there are multiple rows for that ID). The yellow highlight is what i don't want; in that case there is a '1' in two rows for the same year. In the preferred dataset there is only one '1' per year per ID possible.
For example: ID 4 lived in 2013 in 2 places (NY and LA), therefore there are 2 rows. At this point there is a 1 in each row for 2013 and I only want a 1 in the row the ID lived the longest between 1-1-2013 and 31-12-2018. ID 4 lived in 2013 longer in LA than in NY, and so only a 1 should be at the row for NY (so in this case the row of LA will be removed because only '0's remain).
I can also put this file in RStudio.
Thank you!
structure(v1)
ID CITY ZIPCODE DATE_START DATE_END DATE_END.1 X2013 X2014 X2015 X2016 X2017 X2018
1 1 NY 1234EF 1-12-2003 31-12-2018 1 1 1 1 1 1
2 2 NY 1234CD 1-12-2003 14-1-2019 14-1-2019 1 1 1 1 1 1
3 2 NY 1234AB 15-1-2019 31-12-2018 0 0 0 0 0 0
4 3 NY 1234AB 15-1-2019 31-12-2018 0 0 0 0 0 0
5 3 NY 1234CD 1-12-2003 14-1-2019 14-1-2019 1 1 1 1 1 1
6 4 LA 1111AB 4-5-2013 31-12-2018 1 1 1 1 1 1
7 4 NY 2222AB 1-12-2003 3-5-2013 3-5-2013 1 0 0 0 0 0
8 5 MIAMI 5555CD 6-2-2015 20-6-2016 20-6-2016 0 0 1 1 0 0
9 5 VEGAS 3333AB 1-1-2004 31-12-2018 1 1 1 1 1 1
10 5 ORLANDO 4444AB 26-2-2004 5-2-2015 5-2-2015 1 1 1 0 0 0
11 5 MIAMI 5555AB 21-6-2016 31-12-2018 31-12-2018 0 0 0 1 1 1
12 5 MIAMI 5555AB 1-1-2019 31-12-2018 0 0 0 0 0 0
13 6 AUSTIN 6666AB 28-2-2017 3-11-2017 3-11-2017 0 0 0 0 1 0
14 6 AUSTIN 6666AB 4-11-2017 31-12-2018 0 0 0 0 1 1
15 6 AUSTIN 7777AB 20-1-2017 27-2-2017 27-2-2017 0 0 0 0 1 0
16 6 AUSTIN 8888AB 1-12-2003 19-1-2017 19-1-2017 1 1 1 1 1 0
>
structure(list(ID = c(1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L), CITY = structure(c(4L, 4L, 4L, 4L, 4L,
2L, 4L, 3L, 6L, 5L, 3L, 3L, 1L, 1L, 1L, 1L), .Label = c("AUSTIN",
"LA", "MIAMI", "NY", "ORLANDO", "VEGAS"), class = "factor"),
ZIPCODE = structure(c(4L, 3L, 2L, 2L, 3L, 1L, 5L, 9L, 6L,
7L, 8L, 8L, 10L, 10L, 11L, 12L), .Label = c("1111AB", "1234AB",
"1234CD", "1234EF", "2222AB", "3333AB", "4444AB", "5555AB",
"5555CD", "6666AB", "7777AB", "8888AB"), class = "factor"),
DATE_START = structure(c(3L, 3L, 4L, 4L, 3L, 10L, 3L, 11L,
1L, 7L, 6L, 2L, 8L, 9L, 5L, 3L), .Label = c("1-1-2004", "1-1-2019",
"1-12-2003", "15-1-2019", "20-1-2017", "21-6-2016", "26-2-2004",
"28-2-2017", "4-11-2017", "4-5-2013", "6-2-2015"), class = "factor"),
DATE_END = structure(c(1L, 2L, 1L, 1L, 2L, 1L, 7L, 4L, 1L,
9L, 8L, 1L, 6L, 1L, 5L, 3L), .Label = c("", "14-1-2019",
"19-1-2017", "20-6-2016", "27-2-2017", "3-11-2017", "3-5-2013",
"31-12-2018", "5-2-2015"), class = "factor"), DATE_END.1 = structure(c(7L,
1L, 7L, 7L, 1L, 7L, 6L, 3L, 7L, 8L, 7L, 7L, 5L, 7L, 4L, 2L
), .Label = c("14-1-2019", "19-1-2017", "20-6-2016", "27-2-2017",
"3-11-2017", "3-5-2013", "31-12-2018", "5-2-2015"), class = "factor"),
X2013 = c(1L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L,
0L, 0L, 0L, 1L), X2014 = c(1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L), X2015 = c(1L, 1L, 0L, 0L,
1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L), X2016 = c(1L,
1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L
), X2017 = c(1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L,
0L, 1L, 1L, 1L, 1L), X2018 = c(1L, 1L, 0L, 0L, 1L, 1L, 0L,
0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-16L))