0

enter image description here

  1. The line on the picture is a weighted least squares trendline,and I plan to use ggplot to draw it.

  2. The ordinate of the center of a circle is the Y-axis value("HR for CHD……");

We can see ten different cirlces in the picture because there are ten different "HR" value, and every set of data points have the same Y value(a total of 10 sets of data points).

【Sorry there was a mistake I just made! Maybe the abscissa of the circle centre is the average of X value of a set of data points? 】

3.The size of the circle depends on the sample size.

df <- data.frame(y=c(1, 1, 2, 2, 2, 3, 4, 4, 4, 4, 5, 5, 6, 6, 7, 8),
                 x=c(48, 78, 72, 70, 66, 92, 93, 75, 75, 80, 95, 97, 90, 96, 99, 99))

Just use these data for test.I have known how to draw a weighted least squares trendline in ggplot, but how can I add 8 circles( there are 8 different "y" value in the test data) .

tumidou
  • 35
  • 5
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input that can be used to test and verify possible solutions. – MrFlick Aug 23 '21 at 17:03
  • @MrFlick OK. I edit it. – tumidou Aug 23 '21 at 17:15
  • Where is the variable that includes the sample size? – mikebader Aug 23 '21 at 17:39
  • @mikebader Uh……I don't know what to do next. Do I need to create a new data frame including count results? Just like ``` countresult <- data.frame(y=c(1, 2,3,4,5,6,7,8), count=c(2,3,1,4,2,2,1,1)) ``` – tumidou Aug 23 '21 at 17:45

1 Answers1

0

You can add a second plot to the original plot.

Suppose that you had a third variable named samp. You would use geom_point and set the size aesthetic to equal the sample size.

I'm not exactly sure how you will enter the data predicted by WLS, but to add the circles you would add:

library(ggplot2)
df <- data.frame(y=c(1, 1, 2, 2, 2, 3, 4, 4, 4, 4, 5, 5, 6, 6, 7, 8),
                 x=c(48, 78, 72, 70, 66, 92, 93, 75, 75, 80, 95, 97, 90, 96, 99, 99)
)

df <- df %>%
    group_by(y) %>%
    add_count() %>%
    summarise(across(everything(), mean))

ggplot(df, aes(x = x, y = y, size = n)) +
    geom_point(shape = 21) +
    geom_smooth(method = "lm", mapping = aes(weight = n),
                color = "red")
mikebader
  • 1,075
  • 3
  • 12
  • But the structue of data frame of WLS PLOT is different from the sample size.(Data to predicted by WLS is longitudinal data) . Maybe we can't add a second plot to the original code when the two plots don‘t share the same data frame? I plan to refer to this case to draw a WLS PLOT https://stackoverflow.com/questions/42507098/adding-a-weighted-least-squares-trendline-in-ggplot2 – tumidou Aug 23 '21 at 18:05
  • It would help a lot if you can provide an example of your data by, for example, using the `dput()` command. I also do not understand how the WLS plot will be longitudinal given what you have provided in your question. – mikebader Aug 23 '21 at 18:13
  • Can this case be work?Just like the longitudinal data in this case https://www.statology.org/weighted-least-squares-in-r/ I refer to its data example but I switched the values of x and y. The case doesn't use ggplot, but I think I can deal with it. – tumidou Aug 23 '21 at 18:23
  • The circle's size only depends on the sample size of same Y-axis values. It doesn't matter whether the X-axis values are the same or not. We only need to count how many Y values are identical. – tumidou Aug 23 '21 at 18:30
  • @tumidou, so you are saying that you want the circle at y=2 to be 1.5 times larger than the circle at y=1 and the circle at y=4 to be 2 times larger than the circle at y=1? – mikebader Aug 23 '21 at 20:13
  • Yes! The size of the circle depends only on the sample size of Y values. – tumidou Aug 24 '21 at 05:44
  • Your data do not match the example plot. That plot contains three variables: an independent variable (Difference in Lp(a) nmol/L), a dependent variable (HR for CHD), and a sample size variable (the size of the circles). Given what you said, however, I have created a new variable, `n`, that contains the count of each `y`, but it produces multiple points for each y-value, unlike the plot. – mikebader Aug 24 '21 at 14:20
  • You're right, so I guest the X-axis value of the circle centre is the average of all x values which map to the same y. For example, three points: (72,2) (70,2) (66,2) generate the circle centre (average(72,70,66),2)= (67.33,2). – tumidou Aug 24 '21 at 15:41
  • I added the command `summarise(across(everything(), mean))` to accomplish that. – mikebader Aug 24 '21 at 15:53