2

I am trying to set the size of geom_point according to a factor. I know it is not advised, but my data is extremely unbalanced (the minimum value is 6 while the maximum is larger than 10,000).

I am trying to make the size of the points reflect the total sample sizes of studies. I divided total sample sizes into 6 levels: less than 100; 100 to 500; 500 to 1,000; 1,000 to 5,000; 5,000 to 10,000; and more than 10,000.

Here is my attempt:

rct_findings <- findings %>% 
  mutate(
   
    Sample_Size_Range = case_when(
      0 < Outcome_Sample_Size & Outcome_Sample_Size <= 100 ~ "0 < n <= 100",
      100 < Outcome_Sample_Size & Outcome_Sample_Size <= 500 ~ "100 < n <= 500",
      500 < Outcome_Sample_Size & Outcome_Sample_Size <= 1000 ~ "500 < n <= 1,000",
      1000 < Outcome_Sample_Size & Outcome_Sample_Size <= 5000 ~ "1,000 < n <= 5,000",
      5000 < Outcome_Sample_Size & Outcome_Sample_Size <= 10000 ~ "5,000 < n <= 10,000",
      10000 < Outcome_Sample_Size ~ "10,000 < n"),
    
    Sample_Size_Range = fct_relevel(Sample_Size_Range, c("0 < n <= 100", "100 < n <= 500", "500 < n <= 1,000", "1,000 < n <= 5,000", "5,000 < n <= 10,000", "10,000 < n")))
ggplot(rct_findings, aes(x = Effect_Size_Study, y = F_test_var_stat, size = as_factor(Sample_Size_Range))) +
  geom_point() 

The error message I got is:

Error in grid.Call.graphics(C_setviewport, vp, TRUE) : non-finite location and/or size for viewport In addition: Warning messages: 1: Using size for a discrete variable is not advised. 2: Removed 16 rows containing missing values (geom_point).

Anyone has any suggestion about how to fix this?

AndrewGB
  • 16,126
  • 5
  • 18
  • 49
Judy Zhang
  • 23
  • 3
  • 1
    I suspect the reason you're using factors is because you want the smallest to be comparable to the largest, etc. What you can do, is if the default sizing method is too extreme for you, you can transform the size variable (e.g. square root) before using it to set the size. – Daniel V Dec 13 '21 at 23:24
  • Welcome to Stack Overflow. Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including a small representative dataset in a plain text format - for example the output from `dput(findings)`, if that is not too large. – neilfws Dec 13 '21 at 23:25

1 Answers1

2

This seems like a good usecase for the binned scale for size, with which you can circumvent setting the variable as a factor altogether.

library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.1.1

# Dummy data
rct_findings <- data.frame(
  Effect_Size_Study = rnorm(100),
  F_test_var_stat = runif(100),
  Outcome_Sample_Size = runif(100, min = 6, max = 10000)
)

ggplot(rct_findings, aes(x = Effect_Size_Study, y = F_test_var_stat)) +
  geom_point(aes(size = Outcome_Sample_Size)) +
  scale_size_binned_area(
    limits = c(0, 10000),
    breaks = c(0, 100, 500, 1000, 5000, 10000),
  )

Created on 2021-12-14 by the reprex package (v2.0.1)

teunbrand
  • 33,645
  • 4
  • 37
  • 63