I have tried the solution for Arranging rows in custom order using dplyr but still can't figure it out for my dataframe.
I have a dataframe USA_tech
with 62,000 entries that looks like this:
region supplysector subsector technology year coefficient tech_change
AK agriculture agriculture agriculture 1975 .01 NA
AL agriculture agriculture agriculture 1975 .22 NA
AR agriculture agriculture agriculture 1975 .04 NA
AZ agriculture agriculture agriculture 1975 .09 NA
AK construction construction construction 1975 .14 NA
AL construction construction construction 1975 .30 NA
AR construction construction construction 1975 .07 NA
AZ construction construction construction 1975 .06 NA
The dataframe has year
1975 - 2100, generally in 5 year increments. The dataframe is currently arranged with year in ascending order, the 50 states in ascending order, and supplysector/subsector/technology all grouped together.
I'm wanting for the rows to be ordered with all states right next to each other (all AK entries at the top, WY at the bottom), the supplysector/subsector/technology all together, and the years to change by row (one 1975 entry, then one 1990 entry, etc. all the way to 2100) so it would look like this:
region supplysector subsector technology year coefficient tech_change
AK agriculture agriculture agriculture 1975 .01 NA
AK agriculture agriculture agriculture 1990 .12 NA
AK agriculture agriculture agriculture 2005 .05 NA
AK agriculture agriculture agriculture 2010 .34 NA
AK agriculture agriculture agriculture 2015 NA .3
AK agriculture agriculture agriculture 2020 NA .2
AK agriculture agriculture agriculture 2025 NA .1
The order is important because the coefficient from the year/row above will be used to calculate the next year's value for that state, supplysector, subsector, technology combo.
This is what I tried:
USA_tech_change_arranged <- USA_tech %>%
arrange( match( year, c( 1975, 1990, 2005, 2010, 2015, 2020, 2025, 2030, 2035, 2040, 2045,
2050, 2055, 2060, 2065, 2070, 2075, 2080, 2085, 2090, 2095, 2100 ) ), region, supplysector )
Which is somewhat successful, but the year order is not applied how I need it to be.
Thanks!
data:
> dput(USA_tech)
structure(list(region = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("AK", "AL", "AR", "AZ", "CA", "CO", "CT", "DC",
"DE", "FL", "GA", "HI", "IA", "ID", "IL", "IN", "KS", "KY", "LA",
"MA", "MD", "ME", "MI", "MN", "MO", "MS", "MT", "NC", "ND", "NE",
"NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA", "RI", "SC",
"SD", "TN", "TX", "UT", "VA", "VT", "WA", "WI", "WV", "WY"), class = "factor"),
supplysector = structure(c(1L, 1L, 5L, 5L, 5L, 5L, 1L, 1L,
5L, 5L, 5L, 5L, 1L, 1L, 5L, 5L, 5L, 5L, 1L, 1L, 5L, 5L, 5L,
5L), .Label = c("agriculture", "aluminum and nonferrous metals",
"cement energy processes", "chemicals", "construction", "food processing",
"iron and steel", "mining", "other manufacturing", "other nonmetallic minerals",
"pulp paper and wood"), class = "factor"), subsector = structure(c(1L,
1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L,
5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L), .Label = c("agriculture energy",
"boilers", "boilers_CHP", "construction energy", "construction feedstocks",
"electrochemical", "feedstocks", "machine drive", "mining energy",
"other uses", "process heat"), class = "factor"), technology = structure(c(1L,
1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L,
5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L), .Label = c("agriculture energy",
"boilers", "boilers_CHP", "construction energy", "construction feedstocks",
"electrochemical", "feedstocks", "machine drive", "mining energy",
"other uses", "process heat"), class = "factor"), year = c(1975L,
1975L, 1975L, 1975L, 1975L, 1975L, 1990L, 1990L, 1990L, 1990L,
1990L, 1990L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L), coefficient = c(0.00415283842675507,
0.0105087067678448, 0.00251527374007625, 0.00401004800633499,
0.00539236968879248, 0.00185602527562958, 0.00428571855936047,
0.00602247397804429, 0.00520793681510989, 0.00246830444537675,
0.00355039681492185, 0.00265090847659984, 0.005530092870379,
0.00728658128739465, 0.00796292303916165, 0.00288955401140914,
0.00282405286490722, 0.00494969254413892, 0.00515548884308403,
0.00562261318636465, 0.00629285089491791, 0.00235150770450291,
0.00172229336981847, 0.00357616051072436), market.name = structure(c(1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("AK", "AL", "AR",
"AZ", "CA", "CO", "CT", "DC", "DE", "FL", "GA", "HI", "IA",
"ID", "IL", "IN", "KS", "KY", "LA", "MA", "MD", "ME", "MI",
"MN", "MO", "MS", "MT", "NC", "ND", "NE", "NH", "NJ", "NM",
"NV", "NY", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN",
"TX", "UT", "VA", "VT", "WA", "WI", "WV", "WY"), class = "factor"),
tech_change = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA,
-24L), class = "data.frame")