0

I have a dataset with the first few rows shown below:

dataset

I would like to plot the change of the means of these columns in a line graph. I know I can find the individual mean of a column using mean(df$column), but I don't know how to graph these without a separate time variable, which I do not have. Additionally, the column names include dates, ranging from 2017-2050, and I would like to scale the x-axis so that each column mean appears at its date appropriately spaced from the others by time. For example, I would want the scale to start at 2017, have several closely spaced entries through 2020, and then be spaced out accordingly with each following column until 2050. I know I can change the scale in general with the xlim() function, but I don't know how to space the future ones out accordingly with the variable names. Any help would be appreciated!

Data:

dataset <- structure(list(tons_2017 = c(64.533, 3049.580, 1.609), 
                          tons_2018 = c(65.613, 3100.588, 1.636), 
                          tons_2019 = c(68.331, 3229.061, 1.704), 
                          tons_2020 = c(68.816, 3251.973, 1.716), 
                          tons_2022 = c(73.408, 3493.93, 1.755),
                          tons_2023 = c(75.368, 3567.198, 1.743), 
                          tons_2025 = c(88.289, 4052.954, 1.756), 
                          tons_2030 = c(106.873, 4749.285, 1.896), 
                          tons_2035 = c(126.056, 5361.734, 1.954), 
                          tons_2040 = c(152.926, 6272.844, 2.149), 
                          tons_2045 = c(186.799, 7393.864, 2.428), 
                          tons_2050 = c(219.586, 8429.251, 2.650)), 
                     row.names = c(NA, 3L), 
                     class = "data.frame")
  • Welcome to Stack Overflow. We cannot read data into R from images. Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including a small representative dataset in a plain text format - for example the output from `dput(yourdata)`, if that is not too large. – neilfws Oct 20 '22 at 21:22
  • 1
    It also looks like this done another way: structure(list(tons_2017 = c(64.533, 3049.580, 1.609), tons_2018 = c(65.613, 3100.588, 1.636), tons_2019 = c(68.331, 3229.061, 1.704), tons_2020 = c(68.816, 3251.973, 1.716), , tons_2022 = c(73.408, 3493.93, 1.755), tons_2023 = c(75.368, 3567.198, 1.743), tons_2025 = c(88.289, 4052.954, 1.756), tons_2030 = c(106.873, 4749.285, 1.896), tons_2035 = c(126.056, 5361.734, 1.954), tons_2040 = c(152.926, 6272.844, 2.149), tons_2045 = c(186.799, 7393.864, 2.428), tons_2050 = c(219.586, 8429.251, 2.650)), row.names = c(NA, 3L), class = "data.frame") – Stackfan57 Oct 20 '22 at 21:55
  • a stray comma in there, but good enough :) – neilfws Oct 20 '22 at 22:04
  • thanks for helping get that cleared up! – Stackfan57 Oct 20 '22 at 22:13

1 Answers1

0

EDITED: based on comments

I think what you need to do is reshape the data from "wide" to "long" form, convert the column names into numeric values, then group by those values to calculate the means.

Something like this:

library(tidyverse)

dataset %>% 
  select(starts_with("tons_")) %>%
  pivot_longer(everything()) %>% 
  mutate(name = as.numeric(gsub("tons_", "", name))) %>% 
  group_by(name) %>% 
  summarise(meanVal = mean(value)) %>% 
  ggplot(aes(name, meanVal)) + 
  geom_line()

After the summarise step, the data looks like this:

# A tibble: 12 × 2
   name meanVal
   <dbl>   <dbl>
 1  2017   1039.
 2  2018   1056.
 3  2019   1100.
 4  2020   1108.
 5  2022   1190.
 6  2023   1215.
 7  2025   1381.
 8  2030   1619.
 9  2035   1830.
10  2040   2143.
11  2045   2528.
12  2050   2884.

And the chart looks like this:

enter image description here

neilfws
  • 32,751
  • 5
  • 50
  • 63
  • This is exactly what I'm looking for, but when I adjust it for my dataset and run it, I get an error that "everything()` must be used within a *selecting* function." Do you know why this is? Is pivot_longer not a selecting function, or am I supposed to be replacing "everything" with something else? – Stackfan57 Oct 20 '22 at 22:43
  • I don't know why you would get that error, unless you have an older version of dplyr or tidyr? If your real dataset is substantially different in structure to the example in the question, then the example should be edited. – neilfws Oct 20 '22 at 22:49
  • It's probably the latter; the subset I included in the example are only columns 11-22 in my dataset, with the other columns being very different, including categorical variables. Some quick research has shown me I can run data and cols as arguments, but just putting 11:22 or c(11:22) isn't subsetting the columns appropriately. Hope this was clear and, if not, I can try to explain more! – Stackfan57 Oct 20 '22 at 22:55
  • OK, I edited the code to select columns that start with "tons_" if that helps. – neilfws Oct 20 '22 at 23:01
  • 1
    I got it! Thanks so much for all the help, best first post to stack overflow I could've had. – Stackfan57 Oct 20 '22 at 23:11