2

I am using ggplot to make a plot and I'm having trouble specifying the values of the x-axis. I would like each sample value to be shown on the graph i.e. 1-50.

Here is a short portion of the code:

   chrom chr_start  chr_stop num_positions normal_depth tumor_depth log2_ratio gc_content sample
   324202     1 156249804 156249858            55         12.3         4.7     -1.399       34.5     10
   324203     1 156250463 156250473            11         10.0         4.6     -1.109       27.3     10
   324204     1 156250664 156250705            42         12.0         7.4     -0.704       19.0     10
   324205     1 156250816 156250847            32         11.7         4.6     -1.343       40.6     10
   324206     1 156251108 156251132            25         10.6         3.6     -1.569       60.0     10
   324207     1 156251411 156251464            54         12.3         6.8     -0.863       46.3     10

Here is the ggplot function:

newHist = ggplot(resultsPileup1COMBINED[resultsPileup1COMBINED$sample <= 25,],
                 aes(x=sample)) +
  geom_histogram(fill="blue") +
  geom_histogram(data=resultsPileup1COMBINED[resultsPileup1COMBINED$sample > 25,], 
                 aes(x=sample), fill="gray50")
kgui
  • 4,015
  • 5
  • 41
  • 53
  • In your example, all values for sample are 10, is this correct? – jeremycg Jun 23 '15 at 19:45
  • I showed a small portion of the data frame. There are values 1 through 50 – kgui Jun 23 '15 at 19:46
  • http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example It would help if your dataset were in a way that would be easy to load into an R session. – kasterma Jun 23 '15 at 19:46
  • @kasterma Sorry but, I imported the data frame from a text file. I'm not sure how I can help but, please let me know if I can. – kgui Jun 23 '15 at 19:49

1 Answers1

2

I would plot like this:

ggplot(resultsPileup1COMBINED[resultsPileup1COMBINED$sample <= 25, ],
         aes(x=sample)) +
       geom_histogram(fill = "blue",binwidth = 1) +
       geom_histogram(data = resultsPileup1COMBINED[resultsPileup1COMBINED$sample > 25, ], 
         aes(x=sample), fill = "gray50", binwidth = 1) + 
       scale_x_continuous(limits = c(0, 50), breaks = 0:50)

The two main additions are binwidth = 1 to ensure every sample gets its own column, and scale_x_continuous to limit the scales, with the breaks = 0:50 call to manually label the axis

Here's the data with a couple of 40s to test the second plot call:

dput(resultsPileup1COMBINED)

structure(list(chrom = c(1L, 1L, 1L, 1L, 1L, 1L), chr_start = c(156249804L, 
156250463L, 156250664L, 156250816L, 156251108L, 156251411L), 
    chr_stop = c(156249858L, 156250473L, 156250705L, 156250847L, 
    156251132L, 156251464L), num_positions = c(55L, 11L, 42L, 
    32L, 25L, 54L), normal_depth = c(12.3, 10, 12, 11.7, 10.6, 
    12.3), tumor_depth = c(4.7, 4.6, 7.4, 4.6, 3.6, 6.8), log2_ratio = c(-1.399, 
    -1.109, -0.704, -1.343, -1.569, -0.863), gc_content = c(34.5, 
    27.3, 19, 40.6, 60, 46.3), sample = c(10L, 10L, 10L, 10L, 
    40L, 40L)), .Names = c("chrom", "chr_start", "chr_stop", 
"num_positions", "normal_depth", "tumor_depth", "log2_ratio", 
"gc_content", "sample"), class = "data.frame", row.names = c("324202", 
"324203", "324204", "324205", "324206", "324207"))
jeremycg
  • 24,657
  • 5
  • 63
  • 74
  • I rewrote the above code and it was somewhat good because it did specify more values on the x-axis but, not all of them. It created intervals of 10 i.e. 0,10,20,30,40,50. Where in the code can I make it 0,1,2,3...49,50. – kgui Jun 23 '15 at 20:02
  • oh, so you want labels at every value from 1 to 50 on the x axis? – jeremycg Jun 23 '15 at 20:04
  • yeah, exactly. 1 to 50. But, I also want to be able to know how I can change it to other intervals as well, for future references. – kgui Jun 23 '15 at 20:05
  • ok I've updated the answer - use breaks = 0:50 in the scale_x_continuous call – jeremycg Jun 23 '15 at 20:08
  • Would it be possible to have it also in intervals of 1,3,5,7, ... 48, 50 – kgui Jun 23 '15 at 20:35
  • sure, as long as you can specify it in a sequence. You could try `breaks = seq(from=1, to=50, by=2)` or any other vector – jeremycg Jun 23 '15 at 20:37