0

Here's the ggplot I have:

  listTimelinePlot <- ggplot(listDf, aes(x=N, y=Measurement_Value,color="List")) +
    xlab("n") +
    ylab("Time to append n items") +
    scale_x_log10() +
    scale_y_log10() +
    geom_line() +
    geom_point()

N is an array of integers that may contain duplicate values. As a result, in the resulting plot there are multiple points that share the same x-value:

How do I make it so that only one point is displayed per x-value, namely a point with its y-value equal to the average of the points' y-values? I'm assuming that the 'joints' created by geom_line() meet at the mean y-value.

James Ko
  • 32,215
  • 30
  • 128
  • 239
  • The geom line connects all the points - it doesn't know what the mean is. It's probably connecting them in whatever order they are in your data frame. – Gregor Thomas Feb 01 '18 at 01:37
  • 1
    ggplot is really good at plotting the data you give it. If you want it to plot means, give it means. Use the R-FAQ on [calculating means by group](https://stackoverflow.com/q/11562656/903061) to get a nice summary data frame and plot that. – Gregor Thomas Feb 01 '18 at 01:38
  • or , like in https://stackoverflow.com/questions/48550156/, one can also summarise and pipe directly into ggplot :) – tjebo Feb 01 '18 at 01:43

1 Answers1

0

Just compress your dataframe to means in the first place. ddply from the plyr package should do the job.

newListDF <- ddply(listDF, "N", numcolwise(mean))

The first input to the function is the data, the second is the column you want to categorise by, and the last column is the function you want to apply to the groupings (numcolwise is required to make the function apply in the correct direction of the data frame).

This will give you a data frame where we have calculate the mean of the N values for each x. Look at the names for this dataframe and you can use this as an input to ggplot instead.

LachlanO
  • 1,152
  • 8
  • 14