1

I have couple of questions regarding plotting using ggplot2. I have already used below commands to colour data points using R.

library(ggplot2)
df <- read.csv(file="c:\\query2.csv")
ggplot( df,aes( x = Time,y ,y = users,colour = users>40) ) +  geom_point()

My question is: how should I draw a continuous line connecting data points and how do I circle around data points for users >40?

Cœur
  • 37,241
  • 25
  • 195
  • 267
RKM
  • 47
  • 1
  • 5
  • 1
    can we have reproducible example please? `geom_encircle` from the Github version of the `ggalt` package might help: https://github.com/hrbrmstr/ggalt/blob/master/man/geom_encircle.Rd – Ben Bolker Jul 15 '16 at 00:43

1 Answers1

5

To connect the points, use geom_line (if that doesn't give you what you need, please explain what you're trying to accomplish).

I haven't used geom_encircle, but another option is to use a filled marker with the fill deleted to create the circles. Here's an example, using the built-in mtcars data frame for illustration:

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point() +
  geom_point(data=mtcars[mtcars$mpg>30,],
             pch=21, fill=NA, size=4, colour="red", stroke=1) +
  theme_bw()

pch=21 is one of the filled markers (see ?pch for more info on other available point markers). We set fill=NA to remove the fill. stroke sets the thickness of the circle border.

enter image description here

UPDATE: To add a line to this chart, using the example above:

ggplot(mtcars, aes(wt, mpg)) + 
  geom_line() +
  geom_point() +
  geom_point(data=mtcars[mtcars$mpg>30,],
             pch=21, fill=NA, size=4, colour="red", stroke=1) +
  theme_bw()

However, if (as in my original code for this graph) you put the aes statement inside the geom, rather than in the initial call to ggplot, then you need to include an aes statement inside geom_line as well.

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Many thanks for quick reply. I have tried your command as below.ggplot(df) + geom_point(aes(df$Time,df$Users))+ geom_point(data=df[df$Users>20],aes(df$Time, df$Users), pch=21, fill=NA, size=4, colour="red", stroke=1).But, ended up with this error "Error in `[.data.frame`(df, df$Users > 20) : undefined columns selected". In the CSV file, Users column is there and I can generate the graph without the condition, data=df[df$Users>20]. – RKM Jul 15 '16 at 04:32
  • 1
    You missed a comma. It should be `data=df[df$Users>20, ]`. – eipi10 Jul 15 '16 at 04:49
  • 1
    In the `df[ , ]` notation for data frames, expressions before the comma refer to row selection; expressions after the comma refer to column selection (all columns are included if there's nothing after the comma). But if you put in an expression without a comma (as in your code) R assumes this refers to column selection (because a data frame is a special kind of `list` and that's one way to select the elements of a `list`). `df$Users>20` produces a logical vector that's longer than the number of columns in your data. This produces an error because it refers to columns that don't exist. – eipi10 Jul 15 '16 at 04:55
  • Also, `geom_point(aes(df$Time,df$Users))` should be `geom_point(aes(Time, Users))` and the same for other geoms. Don't repeat the data frame name inside `aes`. The data frame goes either in the data argument to the geom, e.g., `geom_point(data=df, aes(Time, Users))` or in the initial call to ggplot, e.g., `ggplot(df)` or `ggplot(df, aes(Time, Users))`. – eipi10 Jul 15 '16 at 05:02
  • I amended the commands as per your guidance and it is now > ggplot(df) + geom_point(aes(Time,Users))+ geom_point(data=df[Users>10,],aes(Time,Users), pch=21, fill=NA, size=4, colour="red", stroke=1). Now, I get this error, Error in `[.data.frame`(df, Users > 10, ) : object 'Users' not found. As I said earlier as well, without the logic, the graph is generated with circled points. Your support is highly appreciated. – RKM Jul 15 '16 at 05:42
  • The data frame you feed to ggplot has to follow the usual indexing rules, so you need to change `geom_point(data=df[Users>10,]` to `geom_point(data=df[df$Users>10,]`. Now this subset of your data frame is in the ggplot environment and you can refer to columns without using the name of the data frame. See the ggplot code in my answer for a template of what your code should look like. – eipi10 Jul 15 '16 at 06:43
  • Appreciate for your time for answering my questions and I was able to plot the graph. One more last request, how could I draw a line connecting these points. It seems adding Geom_Line() end of of the command does not work . Also, appreciate if you could refer me some good references to enhance my R coding skills. – RKM Jul 15 '16 at 11:20
  • 1
    One of the reasons we ask for a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) (as @BenBolker did in a comment to your question) is that it saves time by allowing us to see *exactly* what your problem is and provide code *tailored to your specific problem and context*. That avoids all of the back and forth questioning that's often required when an answer has to use different data or code because there's no reproducible example to work with. – eipi10 Jul 15 '16 at 15:57