0

The output of an sapply call returns class: [1] "matrix" "array" which I have never seen before. Transforming this to a dataframe gives a 'wide' format of what I want, though. For example:

df<-structure(list(`04232395` = c("22.8", "0", NA, NA), `04232200` = c("39.4", 
"2024", "47.55", "1977-09-30 to 2019-03-27 , 1976-11-30 to 1977-03-01 , 1975-11-30 to 1976-03-01 , NA"
), `04232185` = c(NA, "0", NA, NA), `04232165` = c(NA, "0", NA, 
NA), `01515600` = c("0.93", "0", NA, NA), `01515580` = c("4.77", 
"0", NA, NA), `01515600` = c("0.93", "0", NA, NA), `01515580` = c("4.77", 
"0", NA, NA), `01515720` = c("4.44", "0", NA, NA), `01515790` = c("1.32", 
"0", NA, NA), `01515600` = c("0.93", "0", NA, NA), `01515720` = c("4.44", 
"0", NA, NA), `01515790` = c("1.32", "0", NA, NA), `01515720` = c("4.44", 
"0", NA, NA), `01515790` = c("1.32", "0", NA, NA), `01515600` = c("0.93", 
"0", NA, NA), `01515580` = c("4.77", "0", NA, NA), `01515720` = c("4.44", 
"0", NA, NA), `04244903` = c("8.02", "0", NA, NA), `04233000` = c("35.2", 
"27605", "75.63", NA), `04233255` = c("86.7", "3903", "10.69", 
NA), `04234000` = c("126", "35786", "98.04", NA), `04233310` = c("42", 
"0", NA, NA), `04233300` = c("39", "10197", "27.93", NA), `04233700` = c("40.3", 
"822", "2.25", NA), `0423368620` = c("29.7", "713", "1.95", NA
), `04233676` = c("20.7", "0", NA, NA), `04233633` = c("40.2", 
"0", NA, NA), `04233635` = c(NA, "0", NA, NA), `04233255` = c("86.7", 
"3903", "10.69", NA), `04234000` = c("126", "35786", "98.04", 
NA), `04233310` = c("42", "0", NA, NA), `04233300` = c("39", 
"10197", "27.93", NA), `01513990` = c("17", "0", NA, NA), `01513910` = c("15.8", 
"0", NA, NA), `01514000` = c("185", "19523", "92.41", "1978-11-07 to 2017-09-30 , 1978-10-10 to 1978-10-14 , NA"
), `01513990` = c("17", "0", NA, NA), `01514000` = c("185", "19523", 
"92.41", "1978-11-07 to 2017-09-30 , 1978-10-10 to 1978-10-14 , NA"
), `01513840` = c("8.59", "791", "2.16", NA), `01513831` = c("4216", 
"4877", "13.36", NA), `01513990` = c("17", "0", NA, NA), `01513830` = c("20.7", 
"0", NA, NA), `01514880` = c("46.5", "0", NA, NA), `01514000` = c("185", 
"19523", "92.41", "1978-11-07 to 2017-09-30 , 1978-10-10 to 1978-10-14 , NA"
), `01513840` = c("8.59", "791", "2.16", NA), `01513831` = c("4216", 
"4877", "13.36", NA), `01513830` = c("20.7", "0", NA, NA), `01514000` = c("185", 
"19523", "92.41", "1978-11-07 to 2017-09-30 , 1978-10-10 to 1978-10-14 , NA"
), `01513990` = c("17", "0", NA, NA), `01513990` = c("17", "0", 
NA, NA), `01513910` = c("15.8", "0", NA, NA), `04233300` = c("39", 
"10197", "27.93", NA), `04233286` = c("27", "7390", "20.24", 
NA), `04233275` = c(NA, "0", NA, NA), `01513930` = c(NA, "0", 
NA, NA), `04233310` = c("42", "0", NA, NA), `04233300` = c("39", 
"10197", "27.93", NA), `04233286` = c("27", "7390", "20.24", 
NA), `04233275` = c(NA, "0", NA, NA), `04233310` = c("42", "0", 
NA, NA), `04233300` = c("39", "10197", "27.93", NA), `04233286` = c("27", 
"7390", "20.24", NA), `04233275` = c(NA, "0", NA, NA), `04233310` = c("42", 
"0", NA, NA), `04233300` = c("39", "10197", "27.93", NA), `04233286` = c("27", 
"7390", "20.24", NA), `04233275` = c(NA, "0", NA, NA), `04233255` = c("86.7", 
"3903", "10.69", NA), `04234000` = c("126", "35786", "98.04", 
NA), `04233310` = c("42", "0", NA, NA), `04233300` = c("39", 
"10197", "27.93", NA), `04233286` = c("27", "7390", "20.24", 
NA), `04233275` = c(NA, "0", NA, NA), `04233310` = c("42", "0", 
NA, NA), `04233300` = c("39", "10197", "27.93", NA), `04233286` = c("27", 
"7390", "20.24", NA), `04233275` = c(NA, "0", NA, NA), `04233275` = c(NA, 
"0", NA, NA), `0423368620` = c("29.7", "713", "1.95", NA), `04233676` = c("20.7", 
"0", NA, NA), `01513930` = c(NA, "0", NA, NA), `04233678` = c("2.73", 
"456", "1.25", NA), `04233286` = c("27", "7390", "20.24", NA), 
    `04233700` = c("40.3", "822", "2.25", NA), `04233275` = c(NA, 
    "0", NA, NA), `0423368620` = c("29.7", "713", "1.95", NA), 
    `04233676` = c("20.7", "0", NA, NA), `04233000` = c("35.2", 
    "27605", "75.63", NA), `04233255` = c("86.7", "3903", "10.69", 
    NA), `04234000` = c("126", "35786", "98.04", NA), `04233310` = c("42", 
    "0", NA, NA), `04233300` = c("39", "10197", "27.93", NA), 
    `04233286` = c("27", "7390", "20.24", NA), `04233275` = c(NA, 
    "0", NA, NA), `01513930` = c(NA, "0", NA, NA), `04233000` = c("35.2", 
    "27605", "75.63", NA), `04233255` = c("86.7", "3903", "10.69", 
    NA), `04234000` = c("126", "35786", "98.04", NA), `04233310` = c("42", 
    "0", NA, NA), `04233300` = c("39", "10197", "27.93", NA), 
    `04233700` = c("40.3", "822", "2.25", NA), `0423368620` = c("29.7", 
    "713", "1.95", NA), `04233676` = c("20.7", "0", NA, NA), 
    `04233678` = c("2.73", "456", "1.25", NA), `04233700` = c("40.3", 
    "822", "2.25", NA), `0423368620` = c("29.7", "713", "1.95", 
    NA), `04233676` = c("20.7", "0", NA, NA), `04233633` = c("40.2", 
    "0", NA, NA), `04233635` = c(NA, "0", NA, NA), `04233678` = c("2.73", 
    "456", "1.25", NA), `04233700` = c("40.3", "822", "2.25", 
    NA), `0423368620` = c("29.7", "713", "1.95", NA), `04233676` = c("20.7", 
    "0", NA, NA), `04233633` = c("40.2", "0", NA, NA), `04233635` = c(NA, 
    "0", NA, NA), `04244985` = c("55.5", "0", NA, NA), `04244990` = c("18.8", 
    "0", NA, NA), `04244903` = c("8.02", "0", NA, NA), `04244000` = c("66.3", 
    "9626", "72.4", "1968-09-29 to 2014-09-30 , NA"), `04243400` = c(NA, 
    "0", NA, NA), `04243390` = c("24.9", "0", NA, NA), `01338800` = c("43.6", 
    "0", NA, NA), `04243400` = c(NA, "0", NA, NA), `04243390` = c("24.9", 
    "0", NA, NA), `01338000` = c("144", "3042", "8.33", NA), 
    `01339060` = c("59.8", "3056", "8.37", NA), `01338800` = c("43.6", 
    "0", NA, NA), `01503970` = c(NA, "0", NA, NA), `01503980` = c("24.3", 
    "0", NA, NA), `04243390` = c("24.9", "0", NA, NA), `01503970` = c(NA, 
    "0", NA, NA), `01503980` = c("24.3", "0", NA, NA)), row.names = c(NA, 
-4L), class = "data.frame")

I would like to have the column names of this dataframe to be in a single row and the 4 rows for each column in the old dataframe to be 4 individual columns in the new. I tried:

df_new<-data.frame(site = colnames(df), DA = df[1,], Q = df[2,], POR = df[3,], Q_gaps = df[4,])

but this is not right.

I have also tried

  df_new<-df %>%
    pivot_longer(everything())%>%
    group_by(name)%>%
    mutate(DA = slice(value,1), Q = slice(value,2), POR = slice(value,3), Q_gaps = slice(value,4))

but this resulted in an error.

Searching for this question does not return what I am looking for, so this may be duplicated elsewhere, I just don't know how to ask it.

EDIT:

this worked and gives the desired output, but any more elegant solution?

 df_new<-data.frame(site = colnames(df), DA = as.numeric(df[1,]), Q = as.numeric(df[2,]), POR = as.numeric(df[3,]), Q_gaps = as.character(df[4,]))
ruggntub
  • 95
  • 5

1 Answers1

0

Edited for the new data (adds setNames/make.unique up front).

library(dplyr)
library(tidyr)
setNames(df, make.unique(names(df))) %>%
  mutate(nm = c("DA", "Q", "POR", "Q_gaps")) %>%
  pivot_longer(-nm, names_to = "site") %>%
  pivot_wider(site, names_from = "nm", values_from = "value") %>%
  mutate(across(-site, ~ type.convert(., as.is=TRUE)), site = sub("\\.[0-9]+$", "", site))
# # A tibble: 132 × 5
#    site        DA     Q   POR Q_gaps                                                                             
#    <chr>    <dbl> <int> <dbl> <chr>                                                                              
#  1 04232395 22.8      0  NA   NA                                                                                 
#  2 04232200 39.4   2024  47.6 1977-09-30 to 2019-03-27 , 1976-11-30 to 1977-03-01 , 1975-11-30 to 1976-03-01 , NA
#  3 04232185 NA        0  NA   NA                                                                                 
#  4 04232165 NA        0  NA   NA                                                                                 
#  5 01515600  0.93     0  NA   NA                                                                                 
#  6 01515580  4.77     0  NA   NA                                                                                 
#  7 01515600  0.93     0  NA   NA                                                                                 
#  8 01515580  4.77     0  NA   NA                                                                                 
#  9 01515720  4.44     0  NA   NA                                                                                 
# 10 01515790  1.32     0  NA   NA                                                                                 
# # … with 122 more rows
# # ℹ Use `print(n = ...)` to see more rows

The site=sub(..) is to remove the .1 (etc) that will occur due to duplicate column names; this should be safe as long as your real site strings never end in a decimal followed by one or more numbers.

Or we can use akrun's suggestion:

as.data.frame(t(setNames(df, make.unique(names(df))))) %>%
  setNames(c("DA","Q","POR","Q_gaps")) %>%
  tibble::rownames_to_column("site") %>%
  mutate(
    across(-site, ~ type.convert(., as.is = TRUE)),
    site = sub("\\.[0-9]+$", "", site)
  ) %>%
  head()
#       site    DA    Q   POR                                                                              Q_gaps
# 1 04232395 22.80    0    NA                                                                                <NA>
# 2 04232200 39.40 2024 47.55 1977-09-30 to 2019-03-27 , 1976-11-30 to 1977-03-01 , 1975-11-30 to 1976-03-01 , NA
# 3 04232185    NA    0    NA                                                                                <NA>
# 4 04232165    NA    0    NA                                                                                <NA>
# 5 01515600  0.93    0    NA                                                                                <NA>
# 6 01515580  4.77    0    NA                                                                                <NA>
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thank you, however my 'non-elegant' solution is the only one that works here on my original length dataset ((not sure why this is making a difference). The first solution you post gives an error: ```Error in `mutate()`: ! Can't transform a data frame with duplicate names.``` and the second solution adds a capital 'X' to the beginning of every string in the site column – ruggntub Feb 07 '23 at 19:28
  • These two methods work, though clearly it modifies the second `01515600` to deduplicate to `01515600.1`. If your `site` strings are always the same, perhaps you can trail with `mutate(site = sub("\\.[0-9]$", "", site))` so that you're back to the simple site names you had originally. Does that work @ruggntub? – r2evans Feb 07 '23 at 19:34
  • Adding this trailing ```mutate``` to akruns solution does not remove the 'X', and adding it to yours still results in error. I will update my post to include the full dataset to make the situation reproducible. – ruggntub Feb 07 '23 at 19:38
  • I see no `X` in my rendering of akrun's suggestion. Regardless, I just edited the answer to include what I just suggested. – r2evans Feb 07 '23 at 19:41
  • I updated the answer with your larger data; I could initially reproduce the error, it is now resolved with the addition of `setNames(...)` at the start. – r2evans Feb 07 '23 at 19:46