0

I have tried the solution for Arranging rows in custom order using dplyr but still can't figure it out for my dataframe.

I have a dataframe USA_tech with 62,000 entries that looks like this:

region     supplysector    subsector    technology    year    coefficient   tech_change
AK          agriculture   agriculture   agriculture   1975      .01             NA  
AL          agriculture   agriculture   agriculture   1975      .22             NA
AR          agriculture   agriculture   agriculture   1975      .04             NA
AZ          agriculture   agriculture   agriculture   1975      .09             NA
AK          construction  construction  construction  1975      .14             NA
AL          construction  construction  construction  1975      .30             NA
AR          construction  construction  construction  1975      .07             NA
AZ          construction  construction  construction  1975      .06             NA

The dataframe has year 1975 - 2100, generally in 5 year increments. The dataframe is currently arranged with year in ascending order, the 50 states in ascending order, and supplysector/subsector/technology all grouped together.

I'm wanting for the rows to be ordered with all states right next to each other (all AK entries at the top, WY at the bottom), the supplysector/subsector/technology all together, and the years to change by row (one 1975 entry, then one 1990 entry, etc. all the way to 2100) so it would look like this:

region     supplysector    subsector    technology    year    coefficient   tech_change
AK          agriculture   agriculture   agriculture   1975      .01             NA  
AK          agriculture   agriculture   agriculture   1990      .12             NA  
AK          agriculture   agriculture   agriculture   2005      .05             NA  
AK          agriculture   agriculture   agriculture   2010      .34             NA  
AK          agriculture   agriculture   agriculture   2015       NA             .3  
AK          agriculture   agriculture   agriculture   2020       NA             .2  
AK          agriculture   agriculture   agriculture   2025       NA             .1  

The order is important because the coefficient from the year/row above will be used to calculate the next year's value for that state, supplysector, subsector, technology combo.

This is what I tried:

USA_tech_change_arranged <- USA_tech %>%
  arrange( match( year, c( 1975, 1990, 2005, 2010, 2015, 2020, 2025, 2030, 2035, 2040, 2045,
                           2050, 2055, 2060, 2065, 2070, 2075, 2080, 2085, 2090, 2095, 2100 ) ), region, supplysector )

Which is somewhat successful, but the year order is not applied how I need it to be.

Thanks!

data:

> dput(USA_tech)
structure(list(region = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L), .Label = c("AK", "AL", "AR", "AZ", "CA", "CO", "CT", "DC", 
"DE", "FL", "GA", "HI", "IA", "ID", "IL", "IN", "KS", "KY", "LA", 
"MA", "MD", "ME", "MI", "MN", "MO", "MS", "MT", "NC", "ND", "NE", 
"NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA", "RI", "SC", 
"SD", "TN", "TX", "UT", "VA", "VT", "WA", "WI", "WV", "WY"), class = "factor"), 
    supplysector = structure(c(1L, 1L, 5L, 5L, 5L, 5L, 1L, 1L, 
    5L, 5L, 5L, 5L, 1L, 1L, 5L, 5L, 5L, 5L, 1L, 1L, 5L, 5L, 5L, 
    5L), .Label = c("agriculture", "aluminum and nonferrous metals", 
    "cement energy processes", "chemicals", "construction", "food processing", 
    "iron and steel", "mining", "other manufacturing", "other nonmetallic minerals", 
    "pulp paper and wood"), class = "factor"), subsector = structure(c(1L, 
    1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L, 
    5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L), .Label = c("agriculture energy", 
    "boilers", "boilers_CHP", "construction energy", "construction feedstocks", 
    "electrochemical", "feedstocks", "machine drive", "mining energy", 
    "other uses", "process heat"), class = "factor"), technology = structure(c(1L, 
    1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L, 
    5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L), .Label = c("agriculture energy", 
    "boilers", "boilers_CHP", "construction energy", "construction feedstocks", 
    "electrochemical", "feedstocks", "machine drive", "mining energy", 
    "other uses", "process heat"), class = "factor"), year = c(1975L, 
    1975L, 1975L, 1975L, 1975L, 1975L, 1990L, 1990L, 1990L, 1990L, 
    1990L, 1990L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2010L, 
    2010L, 2010L, 2010L, 2010L, 2010L), coefficient = c(0.00415283842675507, 
    0.0105087067678448, 0.00251527374007625, 0.00401004800633499, 
    0.00539236968879248, 0.00185602527562958, 0.00428571855936047, 
    0.00602247397804429, 0.00520793681510989, 0.00246830444537675, 
    0.00355039681492185, 0.00265090847659984, 0.005530092870379, 
    0.00728658128739465, 0.00796292303916165, 0.00288955401140914, 
    0.00282405286490722, 0.00494969254413892, 0.00515548884308403, 
    0.00562261318636465, 0.00629285089491791, 0.00235150770450291, 
    0.00172229336981847, 0.00357616051072436), market.name = structure(c(1L, 
    2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
    1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("AK", "AL", "AR", 
    "AZ", "CA", "CO", "CT", "DC", "DE", "FL", "GA", "HI", "IA", 
    "ID", "IL", "IN", "KS", "KY", "LA", "MA", "MD", "ME", "MI", 
    "MN", "MO", "MS", "MT", "NC", "ND", "NE", "NH", "NJ", "NM", 
    "NV", "NY", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", 
    "TX", "UT", "VA", "VT", "WA", "WI", "WV", "WY"), class = "factor"), 
    tech_change = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, 
-24L), class = "data.frame")
Maridee Weber
  • 231
  • 1
  • 8
  • Please add a significative sample of your data in order to reproduce your issue. There are plenty of ways to do what you want but without data is not possible to help you! – Duck Sep 24 '20 at 17:19
  • 2
    I think you need to arrange by region first, then year. – TTS Sep 24 '20 at 17:19
  • Can you please provide some example data (with different values for the columns) with `dput` – starja Sep 24 '20 at 17:19
  • @Duck added data from dput(head) as dput produces way too much – Maridee Weber Sep 24 '20 at 17:27
  • @MarideeWeber here it's not a good idea to use `head`, because then there isn't enough variation in the data. You could sample from different "regions" of your dataframe – starja Sep 24 '20 at 17:32
  • 1
    @starja I updated the data to be a dataframe I filtered that includes different years and states for more variation – Maridee Weber Sep 24 '20 at 17:33

1 Answers1

1

Is this what you want?

library(dplyr)

USA_tech %>% 
  arrange(region, supplysector, subsector, technology, year)
#>    region supplysector               subsector              technology year
#> 1      AK  agriculture      agriculture energy      agriculture energy 1975
#> 2      AK  agriculture      agriculture energy      agriculture energy 1990
#> 3      AK  agriculture      agriculture energy      agriculture energy 2005
#> 4      AK  agriculture      agriculture energy      agriculture energy 2010
#> 5      AK construction     construction energy     construction energy 1975
#> 6      AK construction     construction energy     construction energy 1990
#> 7      AK construction     construction energy     construction energy 2005
#> 8      AK construction     construction energy     construction energy 2010
#> 9      AK construction construction feedstocks construction feedstocks 1975
#> 10     AK construction construction feedstocks construction feedstocks 1990
#> 11     AK construction construction feedstocks construction feedstocks 2005
#> 12     AK construction construction feedstocks construction feedstocks 2010
#> 13     AL  agriculture      agriculture energy      agriculture energy 1975
#> 14     AL  agriculture      agriculture energy      agriculture energy 1990
#> 15     AL  agriculture      agriculture energy      agriculture energy 2005
#> 16     AL  agriculture      agriculture energy      agriculture energy 2010
#> 17     AL construction     construction energy     construction energy 1975
#> 18     AL construction     construction energy     construction energy 1990
#> 19     AL construction     construction energy     construction energy 2005
#> 20     AL construction     construction energy     construction energy 2010
#> 21     AL construction construction feedstocks construction feedstocks 1975
#> 22     AL construction construction feedstocks construction feedstocks 1990
#> 23     AL construction construction feedstocks construction feedstocks 2005
#> 24     AL construction construction feedstocks construction feedstocks 2010
#>    coefficient market.name tech_change
#> 1  0.004152838          AK          NA
#> 2  0.004285719          AK          NA
#> 3  0.005530093          AK          NA
#> 4  0.005155489          AK          NA
#> 5  0.002515274          AK          NA
#> 6  0.005207937          AK          NA
#> 7  0.007962923          AK          NA
#> 8  0.006292851          AK          NA
#> 9  0.005392370          AK          NA
#> 10 0.003550397          AK          NA
#> 11 0.002824053          AK          NA
#> 12 0.001722293          AK          NA
#> 13 0.010508707          AL          NA
#> 14 0.006022474          AL          NA
#> 15 0.007286581          AL          NA
#> 16 0.005622613          AL          NA
#> 17 0.004010048          AL          NA
#> 18 0.002468304          AL          NA
#> 19 0.002889554          AL          NA
#> 20 0.002351508          AL          NA
#> 21 0.001856025          AL          NA
#> 22 0.002650908          AL          NA
#> 23 0.004949693          AL          NA
#> 24 0.003576161          AL          NA

Created on 2020-09-24 by the reprex package (v0.3.0)

When using arrange, the data gets ordered in the order you specify the columns.

starja
  • 9,887
  • 1
  • 13
  • 28