I have a huge data frame in R some of my columns are
STATE FIPS
COUNTY FIPS
EVENT TYPE
EPISODE
I want to aggregate the data as follows
first split it up by state, and then by county within each state, then by event type within each county, and then finally remove all the duplicate "episodes" within each event type. Finally, I want to count the value of each and have that be the output.
I would like the final data frame to be a list of
Every county (with its respective state), the event type, and the count of episode numbers for each event.
For example.
Maryland Montgomery Tornado 7
Maryland Howard Tornado 19
New York Ontario Hurricane 7
and so on so that each county and each event has its own row.
Happy to post dummy data but I am not sure how to do that... thanks!
structure(list(EPISODE_ID = c(1099647L, 1104972L, 1104973L, 1104971L, 1105463L, 1099373L, 1104298L, 1105465L, 1102576L, 1102576L, 1102576L, 1102581L, 1101019L, 1099291L, 1099292L, 1105466L, 1105467L), EVENT_ID = c(5176080, 5166972, 5166973, 5166971, 5167810, 5163387, 5161925, 5168217, 5171130, 5171131, 5171132, 5171137, 5173230, 5164469, 5164470, 5168218, 5168219), STATE = c("ALASKA", "ALABAMA", "ALABAMA", "ALABAMA", "ALABAMA", "ALABAMA", "ALABAMA", "ALABAMA", "ALABAMA", "ALABAMA", "ALABAMA", "ALABAMA", "ALASKA", "ALASKA", "ALASKA", "ALABAMA", "ALABAMA"), STATE_FIPS = c(2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1), YEAR = c(2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000), MONTH_NAME = c("August", "December", "December", "December", "December", "July", "November", "December", "September", "September", "September", "September", "September", "July", "July", "December", "December"), EVENT_TYPE = c("Flash Flood", "Tornado", "Tornado", "Tornado", "Tornado", "Flash Flood", "Tornado", "Tornado", "Flash Flood", "Flash Flood", "Flash Flood", "Flash Flood", "Flash Flood", "Flash Flood", "Flash Flood", "Tornado", "Tornado"), CZ_TYPE = c("C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C" ), CZ_FIPS = c(14, 115, 55, 125, 61, 73, 65, 69, 1, 101, 51, 111, 14, 7, 5, 45, 67)), row.names = c(NA, -17L), class = c("data.table", "data.frame"))
This display :
EPISODE_ID EVENT_ID STATE STATE_FIPS YEAR MONTH_NAME EVENT_TYPE CZ_TYPE
1 1099647 5176080 ALASKA 2 2000 August Flash Flood C
2 1104972 5166972 ALABAMA 1 2000 December Tornado C
3 1104973 5166973 ALABAMA 1 2000 December Tornado C
4 1104971 5166971 ALABAMA 1 2000 December Tornado C
5 1105463 5167810 ALABAMA 1 2000 December Tornado C
6 1099373 5163387 ALABAMA 1 2000 July Flash Flood C
7 1104298 5161925 ALABAMA 1 2000 November Tornado C
8 1105465 5168217 ALABAMA 1 2000 December Tornado C
9 1102576 5171130 ALABAMA 1 2000 September Flash Flood C
10 1102576 5171131 ALABAMA 1 2000 September Flash Flood C
11 1102576 5171132 ALABAMA 1 2000 September Flash Flood C
12 1102581 5171137 ALABAMA 1 2000 September Flash Flood C
13 1101019 5173230 ALASKA 2 2000 September Flash Flood C
14 1099291 5164469 ALASKA 2 2000 July Flash Flood C
15 1099292 5164470 ALASKA 2 2000 July Flash Flood C
16 1105466 5168218 ALABAMA 1 2000 December Tornado C
17 1105467 5168219 ALABAMA 1 2000 December Tornado C
CZ_FIPS
1 14
2 115
3 55
4 125
5 61
6 73
7 65
8 69
9 1
10 101
11 51
12 111
13 14
14 7
15 5
16 45
17 67