0

I want to render range plot or scatter plot for temperature time series in R. Basically, for each region, I need to calculate first 10 year and last 10-year' temperature mean and precipitation total sum respectively; then going to make a range plot that reference year' gdp_percapita (let's say gdp_percapita in 1995) against first 10 year and last 10-year' temperature mean and precipitation total sum.

reproducible data:

Here is the reproducible data that simulated with actual temperature time series:

dat= data.frame(index = rep(c('dex111', 'dex112', 'dex113','dex114','dex115'), each = 30), year =1980:2009,
            region= rep(c('Berlin','Stuttgart','Böblingen','Wartburgkreis','Eisenach'), each=30),
            gdp_percapita=rep(sample.int(40, 30), 5),gva_agr_perworker=rep(sample.int(45, 30), 5),
            temperature=rep(sample.int(50, 30), 5), precipitation=rep(sample.int(60, 30), 5))

update: Here is what I did so far:

library(tidyverse)
func <- dat %>% 
  group_by(temperature, precipitation) %>% 
  summarize_all(funs(mean, sum))

seems I was wrong about to get first ten years and last ten years of mean temperature and total precipitation. Any correction.

func %>% 
  gather(year, region, temperature, precipitation, gdp_percapita) %>% 
  separate(col, into = c("Measurement", "stat")) %>% 
  arrange(region) %>% 
  mutate_at(vars(col, Measurement), fct_inorder) %>% 
  spread(col, val)

But above code is not well fit for making plot, don't know what went wrong in my code? Any idea?

I knowggplot2 is amazing to render expected range plot for this data, but my attempt to reshape the data for making plot is not correct. Any way to make this plot in R? How can I make this happen in ggplot2? Any idea?

update:

not that I choose gdp_percapita of 2000 for all regions in x-axis while periodic mean temperature difference and precipitation sum difference along the y-axis for all regions.

desired plot:

here is the desired range plot for temperature and precipitation:

expected range plot for temperature

expected range plot for precipitation

How do I accomplish my desired output with minimal code and efficiently? Could someone point me in the right direction?

Calum You
  • 14,687
  • 4
  • 23
  • 42
Hamilton
  • 620
  • 2
  • 14
  • 32
  • 2
    Is that range the total range with two different means, or two different ranges? – Calum You Jun 15 '18 at 17:53
  • @CalumYou I think two different means, but I also want to see what happen if use two different ranges. Thanks – Hamilton Jun 15 '18 at 17:55
  • What code have you written so far? – camille Jun 15 '18 at 17:56
  • Please [see here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R post that is easy to answer. That includes posting data, which you've done, but also posting code that you've tried so far and/or research you've already done. If your question is actually how to reshape the data in order to plot it, you can almost certainly find SO posts to help with that – camille Jun 15 '18 at 18:31
  • @camille ufff, why downvote? don't get your motivation, I gave reproducible data and follow the `SO` rule, I just don't understand how can I reshape my data and produce range plot. Is that what `SO` treat people? – Hamilton Jun 15 '18 at 18:38
  • @camille problem is, I don't just know how to do pivot operation and do simple statistics on that, and this is new for me and can't produce my own code, I am honest here and seek idea, that' all. – Hamilton Jun 15 '18 at 18:51
  • 1
    It seems like you have several questions rolled into one, the first of which is how to reshape your data so you can have 2000 GDP along one axis and precipitation/temperature along the other. There are a lot of SO posts to help you go from wide data to long data: [here's one](https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format) with 5 different solid answers – camille Jun 15 '18 at 18:58
  • But the data you posted doesn't seem like it's representative of the data you'll be working with; for example, you want to plot GDP in 2000 along the x-axis, but all observations have the same value for GDP in 2000 – camille Jun 15 '18 at 18:59
  • @camille I am gonna post what I did so far so I am not guilty for asking this question. For your second comment, I just used fake data but keep same data skeleton, no need to be real. Any positive answer? – Hamilton Jun 15 '18 at 19:01
  • @CalumYou I have one more doubt about choosing fixed year' GDP, I mean I going to choose GDP of 1996, but in your solution, it is mean of GDP. Why? Plus, if we use two different means, what would be the solution? Any further elaboration? Thank you! – Hamilton Jun 15 '18 at 23:27
  • 1
    i picked mean because i didn't see that you were choosing 1996 GDP, you can easily change that. You just need a single GDP value to plot the point with. How do you expect multiple GDPs to look? More points with more colours? separate panels? Please limit yourself to a single, concise question where possible. – Calum You Jun 15 '18 at 23:31
  • @CalumYou Thank you so much,I will keep your remark in my mind. – Hamilton Jun 15 '18 at 23:51

1 Answers1

2

Here's a solution that I think does what you want. In general, you should try to keep your questions narrower, because just saying "I don't know what went wrong" makes the question difficult for others to use.

There's a few steps here. I want to get the data into the format of one row per region to plot with summarise, using this to get the arguments for the aesthetics we'll need (geom_point and geom_linerange). Then, to plot the two different groups, we'll gather them so that decade can become a group variable.

N.B. I edited the sample data so that it no longer had every group with the exact same data for a little bit of variety.

geom_text_repel is a nice function from the ggrepel package that makes labels easier to add. We want to filter to just one of the groups so that the labels don't appear twice.

library(tidyverse)

set.seed(2346)
dat <- data.frame(
  index = rep(c("dex111", "dex112", "dex113", "dex114", "dex115"), each = 30),
  year = 1980:2009,
  region = rep(c("Berlin", "Stuttgart", "Böblingen", "Wartburgkreis", "Eisenach"), each = 30),
  ln_gdp_percapita = sample.int(40, 150, replace = TRUE),
  ln_gva_agr_perworker = sample.int(45, 150, replace = TRUE),
  temperature = sample.int(50, 150, replace = TRUE),
  recipitation = sample.int(60, 150, replace = TRUE)
)

stats <- dat %>%
  group_by(region) %>%
  summarise(
    ln_gdp = mean(ln_gdp_percapita),
    range_max = max(temperature),
    range_min = min(temperature),
    decade_80s = mean(temperature[which(year %in% 1980:1989)]),
    decade_00s = mean(temperature[which(year %in% 2000:2009)])
  ) %>%
  gather(decade, mean, decade_80s, decade_00s)

ggplot(stats, aes(x = ln_gdp)) +
  geom_point(aes(y = mean, colour = decade)) +
  geom_linerange(aes(ymin = range_min, ymax = range_max)) +
  ggrepel::geom_text_repel(
    data = . %>% filter(decade == "decade_00s"),
    mapping = aes(y = mean, label = region)
    )

Created on 2018-06-15 by the reprex package (v0.2.0).

Calum You
  • 14,687
  • 4
  • 23
  • 42