0

I am attempting produce a scatter plot using the ggplot2 library. My data frame (called scatterPlotData) is in this form:

115 2.3
120 1.6
.
.
.
132 4.3

(The ... signifies many other similar values). Essentially, a 2 column data frame. I also have labels to go along with each of those points. Firstly, I'm having trouble with the scatterplot itself. I'm using the following code:

p <- ggplot(scatterPlotData, aes("Distance (bp)", "Intensity"))
p + geom_point()

However, using the above code, I get the following plot:

enter image description here

Obviously, it's not a scatter plot. So, I'd be very helpful if someone could point out what I'm doing wrong.

Secondly, it's about the labels. I will have many datapoints which would have the risk of overlapping datapoints. How should I go about just putting on labels to each point using ggplot? Also, it states that I could use the directlabels package to get a good overlap free labelled scatterplot using different colors, however, I'm not sure how I would go about that with ggplot as I haven't found any documentations regarding the use of directlabels with ggplot.

Any help with either (or both) question(s) are greatly appreciated - thanks.

Community
  • 1
  • 1
intl
  • 2,753
  • 9
  • 45
  • 71

2 Answers2

3

Lose the inverted commas, at the moment you're making a plot of the text value... Having looked again, you will have problems with the brackets in your variable name (Distance (bp)). Change that to something without the brackets, then make the ggplot call without the inverted commas:

#Assuming Distance (bp) is the first column
names(scatterPlotData)[1] <- "Distance"
p <- ggplot(scatterPlotData, aes(Distance, Intensity) + geom_point()

As for non-overlapping labels, this is a vexed issue with lots of discussion on SO - I think you'll not get great responses from such a vague question here.

alexwhan
  • 15,636
  • 5
  • 52
  • 66
  • The ... just signifies that there are a lot more similar values. And is there a conventional way of doing labels? – intl Jul 24 '13 at 02:34
  • I described that poorly, I meant the inverted commas, not the dots in your data – alexwhan Jul 24 '13 at 02:51
3

First, it would be much more helpful if you provided a reproducible example the precisely described your data.

You should not be passing variable names in aes in quotes. I'm not sure where you got that from, there wouldn't be a single example of anyone doing that that I can think of (unless they were using aes_string which is specifically for that case).

However, it appears that you have an awkward variable name, i.e. Distance (bp). This is non-standard and not recommended. Names should not have spaces in them. The best thing to do would be to rename that column to something sensible and then do something like:

p <- ggplot(scatterPlotData, aes(x = Distance_bp,y = Intensity))
p + geom_point()

If you do not rename the column, something like this might work:

p <- ggplot(scatterPlotData, aes(x = `Distance (bp)`,y = Intensity))
p + geom_point()

Note that those are backticks, not single quotes.

As for the overlapping data, I would recommend reading here and here.

Community
  • 1
  • 1
joran
  • 169,992
  • 32
  • 429
  • 468
  • Could I not just do p <- ggplot(scatterPlotData, aes(x = scatterPlotData[,1],y = scatterPlotData[,2]))? – intl Jul 24 '13 at 02:57
  • @intl That _might_ work, but you shouldn't do it. `aes` is designed to do some sophisticated evaluation of its arguments. It is best not to mess with it. Always pass names of variables themselves, or simple functions of them (e.g. `log(variable)`,`factor(variable)`). Anything else is asking for trouble. – joran Jul 24 '13 at 03:00
  • OK, so I'll make variables for xScatter = scatterPlotData[,1] and yScatter = scatterPlotData[,2] and use these for p <- ggplot(scatterPlotData, aes(x = xScatter,y = yScatter)). But, what then is the point of passing in scatterPlotData if it contains the data of the variables anyway? Thanks a lot for the info. – intl Jul 24 '13 at 03:06
  • @intl Maybe I wasn't clear. `scatterPlotData` should be a data frame. Each column should have a name. Use **those** names, and make sure they are syntactically valid (i.e. no spaces). – joran Jul 24 '13 at 03:08
  • Thank you, got it working. My issue, as you pointed out, was that I wasn't using the proper column names. I added labels by adding the labels to the overall dataframe and using its column name. – intl Jul 24 '13 at 14:54