0

I feel like this should be really simple, but I have been struggling with it. I used dplyr to get the summary statistics for the following tibble:

Scope <- data %>% group_by(Scope) %>% summarize(Emissions_2018 = sum(Emissions_2018),
         Emissions_2019 = sum(Emissions_2019), 
         "% Change" = (sum(Emissions_2019) - sum(Emissions_2018)) / sum(Emissions_2018) * 100)
Scope
A tibble: 3 x 4
  Scope   `Emissions 2018` `Emissions 2019` `% Change`
  <chr>              <dbl>            <dbl>      <dbl>
1 Scope 1          421972.          418797.     -0.752
2 Scope 2          304711.          281526.     -7.61 
3 Scope 3        52184490.        51729720.     -0.871

Now, I'm trying to use ggplot to graph Emissions 2018, Emissions 2019, and % Change as you would see in pre- and post-treatment experiment. Year is my x-axis, GHG emissions is my y-axis, and % change should be the vertical line between the row observations (2018 to 2019). I've tried many different things, but I am stuck.

Edit (12/5/20):

So I followed through on what you suggested, Andrew. This looks good:

Scope <- read.table(text = "Scope   Emissions2018    Emissions2019       Change
                           Scope_1          421972.          418797.     -0.752
                           Scope_2          304711.          281526.     -7.61 
                           Scope_3        52184490.        51729720.     -0.871", 
                    header = TRUE)

dat <- Scope %>% 
   pivot_longer(Emissions2018:Emissions2019, names_to = "Year") %>% 
   mutate(across(Year, function(x) gsub("Emissions", "", x)))

# round to 3 significant figures and add percentage sign
label_df <- unique(dat[,c("Scope","Change")])
label_df$Change <- paste0(signif(label_df$Change, 3), "%")

p <- ggplot(dat, aes(x = Year, y = value))
p <- p + facet_wrap(. ~ Scope, ncol=3)
p <- p + geom_point()
p <- p + labs(y = "Emissions", x = "Year")
p <- p + geom_text(x = 1.5, y = 1000, aes(label = Change), data = label_df)
print(p)

However, the issue is that, when you actually produce the graphs, you get the following: Scope Emissions.

Because of the y-axis' scaling, the % change is very hard to highlight. I want to apply scaling similar to the code I see here.

My computer does not support the package facetscales, and I also believe that there should be a much simpler way to do this.

Any suggestions?

Alex
  • 15
  • 3
  • I don't understand how you want to plot the %. can you rephrase? Also, should all Scopes be in the same graph? – Edo Dec 04 '20 at 17:24
  • So, I'm trying to graph on the y-axis "GHG Emissions" while showing time as a variable (x-axis). Technically, it's grouped by scope (i.e. scope 1, scope 2, scope 3). I want to show the % change as indicating how the emissions have changed over time. While I don't need to indicate that on the graph, it's a helpful element. In all, this should be similar to a pre- and post-treatment graph, where you show how things change over time and/or with some sort of intervention. – Alex Dec 04 '20 at 17:52

1 Answers1

0

I think you need to pivot_longer to create a "Year" variable for your X-axis.

The graph I include here is nothing fancy, but shows how you can facet on "Scope" so that the three panels share the same emissions axis. Then I use geom_text to annotate the facets with unique "change" percentage.

Scope <- read.table(text = "Scope   Emissions2018    Emissions2019       Change
                           Scope1          421972.          418797.     -0.752
                           Scope2          304711.          281526.     -7.61 
                           Scope3        52184490.        51729720.     -0.871", 
                    header = TRUE)

library(ggplot2)
library(dplyr, warn.conflicts = FALSE)
library(tidyr, warn.conflicts = FALSE)

dat <- Scope %>% 
  pivot_longer(Emissions2018:Emissions2019, names_to = "Year") %>% 
  mutate(across(Year, function(x) gsub("Emissions", "", x)))

# round to 3 significant figures and add percentage sign
label_df <- unique(dat[,c("Scope","Change")])
label_df$Change <- paste0(signif(label_df$Change, 3), "%")

ggplot(dat, aes(x = Year, y = value)) +
  facet_wrap(. ~ Scope, ncol=3) +
  geom_point() +
  labs(y = "Emissions", x = "Year") +
  geom_text(x = 1.5, y=1000, 
            aes(label = Change), data = label_df)

Created on 2020-12-04 by the reprex package (v0.3.0)

Andrew Brown
  • 1,045
  • 6
  • 13
  • Note that the emissions differences are a bit more obvious with a logarithmic Y axis. To do that, add `scale_y_log10()` to the `ggplot()` call – Andrew Brown Dec 04 '20 at 18:06
  • This is a really great answer! Thank you! Of course, and unfortunately, this wouldn't work with larger datasets, but ggplot has its limitations! (Which opens the door for updates to it.) Also, this seems so unnecessarily complicated for such straightforward data. I could do this on Excel, but I really prefer ggplot. I think the scale log is a good idea. However, there are major differences between scope 3 and scopes 1 & 2. Is there any way for me to apply the scale_y_log10() to each facet separately so that I can really highlight the % change? – Alex Dec 04 '20 at 23:19
  • I think the way to highlight the % change would probably be to convert your emissions values into percentages -- by normalizing relative to a particular year, or the mean of all years, or something. I am not sure what you mean about not scaling to bigger datasets -- you mean the labeling? If you had more years you could make a line plot or similar. The advantage of doing it in R, whether you use ggplot, base R graphics, lattice or whatever is that it is reproducible and can be easily set up for batch/variable runs. – Andrew Brown Dec 05 '20 at 00:26