0

I am trying to reproduce these three simple histograms created in excel in R, in order to have something slightly more appealing to the eye. I have no doubt this is simple, but am out of practice with R.

> [![data][1]][1]

[histogram[1]

I have found different tutorials for producing basic histograms, but have yet to find something that will produce three columns (representing years) for each of the distance bins, and then three separate graphs for each of the data groups (A, B, C).
I believe the first thing I need to do is restructure my data, and I guess this is the step I am unsure about.

Thanks in advance.

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Please take a look at [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), to modify your question, with a smaller sample taken from your data (check `?dput()`). Posting images of your data or no data makes it difficult to impossible for us to help you! – massisenergy Apr 05 '20 at 21:42
  • Import your data into R and use `dput()`. Paste the results of that into your question. A picture of your data is not helpful for showing you how to do something in R. Grouped bar charts (not histograms) are simple to do. – dcarlson Apr 05 '20 at 21:43

2 Answers2

2

Yes, you will have to restructure your data. You can do it in R as shown by @stefan or if it's challenging you can do it in excel itself. Tidy data is easy to plot and analyze (see section 12.1 for tidy data and section 3.7, 3.8 for visualization). Tidy data will look something like consisting of four columns - Distance, Value, Value_year, Value_group.

enter image description here

As an example, I stored some data as a tab-delimited file (testdata.txt) and read in using tidyverse's read_delim function. Following is the example code:

library(tidyverse)
foo <- read_delim("testdata.txt", delim = "\t")
foo %>% mutate(Val_year = factor(Val_year, levels=c("2015","2016","2017"))) %>% 
ggplot() + geom_bar(aes(x=Dist, y=Val, fill = Val_year), stat = "identity", position = "dodge") + facet_grid(.~Val_grp)

enter image description here

smandape
  • 1,033
  • 2
  • 14
  • 31
1

Using some random example data the following code is a tidyverse solution which gives you a bar or column chart (as your data is already binned this is the way to go) mimicing your excel chart for one dataset. As you already guessed the tricky part is getting your data into R (to this end: have a look at the readxl package) and to rearrange it for plotting (this is done via pivot_longer from the tidyr package and mutate from dplyr both of which are part of the tidyverse. As for the plotting part I use ggplot2 which is - you might have guessed it (; - also part of the tidyverse.

# Example data set
set.seed(42)

df <- data.frame(
  distance = paste0(seq(0, 3.5, by = 0.5), "-", seq(0.5, 4, by = 0.5)),
  `2015` = round(runif(8) * 8, 0),
  `2016` = round(runif(8) * 8, 0),
  `2017` = round(runif(8) * 8, 0)
)
df
#>   distance X2015 X2016 X2017
#> 1    0-0.5     7     5     8
#> 2    0.5-1     7     6     1
#> 3    1-1.5     2     4     4
#> 4    1.5-2     7     6     4
#> 5    2-2.5     5     7     7
#> 6    2.5-3     4     2     1
#> 7    3-3.5     6     4     8
#> 8    3.5-4     1     8     8

library(tidyverse)

df %>% 
  # Convert the dataset to long format
  pivot_longer(-distance, names_to = "Year", values_to = "Value") %>% 
  # format the dates, get rid of leading Xs
  mutate(Year = gsub("^X", "", Year)) %>% 
  ggplot(aes(distance, Value, fill = Year)) + 
  # Column chart. Add some width between columns
  geom_col(position = position_dodge2(2)) +
  scale_y_continuous(expand = expansion(mult = c(0, .05))) +
  scale_fill_manual(values = c("blue", "orange", "grey")) +
  # Get rid of axis and legend labels
  labs(y = "", x = "", fill = "") +
  theme_bw() +
  theme(legend.position = "bottom")

Created on 2020-04-05 by the reprex package (v0.3.0)

stefan
  • 90,330
  • 6
  • 25
  • 51