I am trying to create a ridgeline graph to visually compare the distribution of monthly data. For each month, I have 20 temperature data generated from 100m areas moving outwards from a point. A ridgeline graph will enable me to both show the change in temperature between months, and moving outwards from the point.
(In due course I'd also like to use it to compare the statistic properties of the data, such as skew, kurtosis et al.)
I believe the data I'm plotting is of the correct type (appreciating that I'm not using some of the data in the tibble at this point):
> str(lst_c1_l1_T1)
tibble [240 × 7] (S3: tbl_df/tbl/data.frame)
$ buffer : Factor w/ 20 levels "100","200","300",..: 2 3 4 5 6 7 8 9 10 11 ...
$ date : Date[1:240], format: "2017-01-16" "2017-01-16" "2017-01-16" "2017-01-16" ...
$ variable : Factor w/ 40 levels "lst_c1_l1_kurtosis",..: 2 2 2 2 2 2 2 2 2 2 ...
$ value : num [1:240] 43.8 43.8 43.8 43.7 43.4 ...
$ Year : chr [1:240] "2017" "2017" "2017" "2017" ...
$ Month : Factor w/ 12 levels "01","02","03",..: 1 1 1 1 1 1 1 1 1 1 ...
$ TimePeriod: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
As far as I can tell my plot uses the right syntax and refers to the correct data in the right places:
plot <- ggplot() +
geom_density_ridges(lst_c1_l1_T1, mapping=aes(x = buffer, y = Month, group = Month, fill = Month, height = value))
plot
(the intention is to map other series in addition to this one)
However, I get the error:
> plot
Picking joint bandwidth of 2.84
Error in `geom_density_ridges()`:
! Problem while setting up geom.
ℹ Error occurred in the 1st layer.
Caused by error in `compute_geom_1()`:
! `geom_density_ridges()` requires the following missing aesthetics: height
Run `rlang::last_trace()` to see where the error occurred.
Warning message:
The following aesthetics were dropped during statistical transformation: height
ℹ This can happen when ggplot fails to infer the correct grouping structure in the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?
I don't understand why height - which I've provided as a numeric variable - is dropped. I don't understand why the group doesn't work.
None of the solutions posted to a similar issue help in this case.
Equally happy to hear of alternative approaches to compare distributions. I have also been getting nowhere with violin and dotplot approaches (possibly for related reasons?).
EDIT
Per Axeman's comment, output of dput(head(lst_c1_l1_T1))
structure(list(buffer = structure(2:7, levels = c("100", "200",
"300", "400", "500", "600", "700", "800", "900", "1000", "1100",
"1200", "1300", "1400", "1500", "1600", "1700", "1800", "1900",
"2000"), class = "factor"), date = structure(c(17182, 17182,
17182, 17182, 17182, 17182), class = "Date"), variable = structure(c(2L,
2L, 2L, 2L, 2L, 2L), levels = c("lst_c1_l1_kurtosis", "lst_c1_l1_lst",
"lst_c1_l1_max", "lst_c1_l1_median", "lst_c1_l1_min", "lst_c1_l1_single_kurtosis",
"lst_c1_l1_single_lst", "lst_c1_l1_single_max", "lst_c1_l1_single_median",
"lst_c1_l1_single_min", "lst_c1_l1_single_skew", "lst_c1_l1_single_stdDev",
"lst_c1_l1_single_variance", "lst_c1_l1_skew", "lst_c1_l1_stdDev",
"lst_c1_l1_variance", "lst_c2_l1_kurtosis", "lst_c2_l1_lst",
"lst_c2_l1_max", "lst_c2_l1_median", "lst_c2_l1_min", "lst_c2_l1_single_kurtosis",
"lst_c2_l1_single_lst", "lst_c2_l1_single_max", "lst_c2_l1_single_median",
"lst_c2_l1_single_min", "lst_c2_l1_single_skew", "lst_c2_l1_single_stdDev",
"lst_c2_l1_single_variance", "lst_c2_l1_skew", "lst_c2_l1_stdDev",
"lst_c2_l1_variance", "lst_c2_l2_kurtosis", "lst_c2_l2_lst",
"lst_c2_l2_max", "lst_c2_l2_median", "lst_c2_l2_min", "lst_c2_l2_skew",
"lst_c2_l2_stdDev", "lst_c2_l2_variance"), class = "factor"),
value = c(43.8048763014736, 43.7770632839523, 43.7671539457081,
43.6734275952591, 43.4396932500121, 43.4661731747384), Year = c("2017",
"2017", "2017", "2017", "2017", "2017"), Month = structure(c(1L,
1L, 1L, 1L, 1L, 1L), levels = c("01", "02", "03", "04", "05",
"06", "07", "08", "09", "10", "11", "12"), class = "factor"),
TimePeriod = structure(c(1L, 1L, 1L, 1L, 1L, 1L), levels = c("1",
"2"), class = "factor")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
EDIT 2
plot <- ggplot() +
geom_ridgeline(lst_c1_l1_T1, mapping=aes(x=Month, y=buffer, height=value, group=buffer, scale=0.02))
plot
Creates the output below, which isn't quite what I',m looking for. Perhaps this isn't the best way to compare subtle differences between approaches to generating the data (as I can't see this being any use at all if I overlay another 4 series on top of it)?