0

I am using the url link to download this dataset:

https://files.hawaii.gov/dbedt/census/census_2020/data/redistricting/PLtable1_2020-county.xlsx

So in R I am coding it as:

url_dbedt_dicennial <- "https://files.hawaii.gov/dbedt/census/census_2020/data/redistricting/PLtable1_2020-county.xlsx"

    # download the xls to a temporary file
    temp <- tempfile(fileext = ".xlsx")
    download.file(url = url_dbedt_dicennial, destfile = temp, mode = "wb")
    
    # data from dbedt dicennial (look at each step to understand)
    data_in_dbedt_dicennial <- temp %>%
      readxl::read_excel(
        range = cellranger::as.cell_limits("A6:H15"),) %>%
        t() %>%

The generated output is the following:

enter image description here What I am struggling right now after transpose is to how relabel the columns as "time", "HI", "HON", "HAW", "KAU", "MAU" and then to eliminate V1, V3, V8, and V9. I know I can eliminate columns manually one-by-one but there is a clever way of doing it? County should be relabeled as time.

Eventually I want to use the mutate function for the time variable, that is,

mutate(time)

and convert the data into time series with

tsbox::ts_long()

State of Hawaii should be labeled as "HI", Hawaii County as "HAW", City and County of Honolulu as "HON", Kauai County as "KAU", and Maui County 1/ as "MAU"

OGC
  • 244
  • 3
  • 13
  • Looks as though your code was cut off after `t() %>%`. You can change the variable (i.e. column) names of a data frame with `names(data_frame_name) <- c("variable_name_1", "variable_name_2" ...)`. – rdelrossi Mar 11 '22 at 22:39
  • Oh yes I haven't finished it yet so the tibble sign is there after t() – OGC Mar 11 '22 at 22:40
  • Can you post the code that got you to the image of the table you posted above? – rdelrossi Mar 11 '22 at 22:41
  • It's actually just by removing the ``%>%`` sign after ``t()`` – OGC Mar 11 '22 at 22:42

1 Answers1

1

So this turned out to be a little more complicated than I first thought, in part because of t(), which is really designed to work with matrices. Fortunately, I was able to find some guidance elsewhere on SO, where I found transpose_df(). Though this works, I imagine this could be cleaned up a bit.

data_in_dbedt_dicennial <- temp %>%
  readxl::read_excel(
    range = cellranger::as.cell_limits("A6:H15"),) %>% 
  na.omit()
  
transpose_df <- function(df) {
  t_df <- data.table::transpose(df)
  colnames(t_df) <- rownames(df)
  rownames(t_df) <- colnames(df)
  t_df <- t_df %>%
    tibble::rownames_to_column(.data = .) %>%
    tibble::as_tibble(.)
  return(t_df)
}

data_in_dbedt_dicennial <- transpose_df(data_in_dbedt_dicennial) %>% 
  .[-1,] %>% 
  rename(
    Year = rowname, HI = `1`, HAW = `2`, 
    HON = `3`, KAU = `4`, MAU = `5`
  ) %>% 
  mutate(across(everything(), as.integer))

Output:

# A tibble: 7 × 6
   Year      HI    HAW     HON   KAU    MAU
             
1  1960  632772  61332  500409 28176  42855
2  1970  769913  63468  630528 29761  46156
3  1980  964691  92053  762565 39082  70991
4  1990 1108229 120317  836231 51177 100504
5  2000 1211537 148677  876156 58463 128241
6  2010 1360301 185079  953207 67091 154924
7  2020 1455271 200629 1016508 73298 164836
rdelrossi
  • 1,114
  • 1
  • 7
  • 17
  • I was trying to use this before as in ``data_in_dbedt_dicennial <- temp %>%`` ``readxl::read_excel(`` ``range = cellranger::as.cell_limits("A6:H15"),) %>%`` ``t() %>%`` ``select(-V1, -V3, -V8, -V9)`` But I am getting an error – OGC Mar 11 '22 at 22:56
  • Plus I also do not want to remove the year column. Transpose seems to mess up the structure of the data so maybe I can try to remove the rows before doing the transpose – OGC Mar 11 '22 at 22:59
  • The code above actually wouldn't run in its current form – OGC Mar 11 '22 at 23:15
  • 1
    Give me a bit. I'm working on a new version. It's clearer to me what you want. – rdelrossi Mar 11 '22 at 23:16
  • I used na.omit() to omit the rows with NAs – OGC Mar 11 '22 at 23:31
  • See if this gets you there @OGC. – rdelrossi Mar 11 '22 at 23:52
  • Thanks a lot! I guess your tranpose_df function solves some of the problems of the built-in t() function? Instead of mutating over integers I converted the data to numeric before using the ts_long() function for the time series – OGC Mar 12 '22 at 00:39