How to plot comparison of old data with new

Question

I have this data frame called mydf. I have number of mutated gene in (count) individuals. I am comparing this with published data (old_counts). I want to plot this data to compare with my data (side by side bar would be appropriate). Any gene without value in old data, I want to mark that as 'new' in the plot (for example, for TTYR gene, I want to mark as new below the counts bar) .

mydf

gene       counts       old_counts
GPT          13          12
TTYR         1           
GTT          2           5 
JUN          3           2

I'm pretty confused by your question and the example doesn't help. Can you please elaborate further and provide some more code of the data and what you have tried? — Vedda, Feb 13 '16 at 06:43
You can use ggplot's geom_bar with dodge. http://docs.ggplot2.org/current/geom_bar.html — Roman Luštrik, Feb 13 '16 at 06:52
@Amstell Well there is one column `counts` which is my data and there is another column `ol_counts` which is old data. I want to put the bar for old data next to new data for every gene. If there is no value in old_counts for any gene, I want to label the bar for that gene as new in `counts`. — MAPK, Feb 13 '16 at 07:22

Jaap · Accepted Answer · 2016-02-13T14:10:52.410

An alternative with ggplot2:

# load needed libraries
library(reshape2)
library(ggplot2)

# set the order of the 'gene' variable if you don't want it to be plotted
# in alphabetical order, else you can skip this step
df1$gene <- factor(df1$gene, levels = c("GPT", "TTYR", "GTT", "JUN"))

# reshape the data
df2 <- melt(df1, "gene")

# create a variable with the labels
df2$lbl <- c(NA,"new","missing")[((is.na(df2$value) & df2$variable=="old_counts") + 1L) + 
                                   (is.na(df2$value) & df2$variable=="counts")*2]


# create the plot
ggplot(df2, aes(x = gene, y = value, fill = variable)) +
  geom_bar(stat="identity", position = position_dodge(width = 0.9), width = 0.7) +
  geom_text(aes(y = -1, label = lbl), size = 5, position = position_dodge(width = 0.7)) +
  theme_minimal(base_size = 14)

which gives:

Another possibility is to place the text labels on the spots of the missing bars:

ggplot(df2, aes(x = gene, y = value, fill = variable)) +
  geom_bar(stat="identity", position = position_dodge(width = 0.9), width = 0.7) +
  geom_text(aes(y = 0.2, label = lbl), hjust = 0, angle = 90, size = 4, position = position_dodge(width = 0.7)) +
  theme_minimal(base_size = 14)

which gives:

For the case when you want to use percentages in your plot and vertical x-axis labels:

# create a percentage value by group
df2$perc <- ave(df2$value, df2$variable, FUN = function(x) x/sum(x, na.rm = TRUE))

# set the break you want to use for the y-axis
brks <- c(0,0.2,0.4,0.6,0.8,1.0)

# load the 'scales' library (needed for the 'percent' function)
library(scales)

# create the plot
ggplot(df2, aes(x = gene, y = perc, fill = variable)) +
  geom_bar(stat="identity", position = position_dodge(width = 0.9), width = 0.7) +
  geom_text(aes(y = 0.02, label = lbl), hjust = 0, angle = 90, size = 4, position = position_dodge(width = 0.7)) +
  scale_y_continuous(breaks = brks, labels = percent(brks), limits = c(0,1)) +
  theme_minimal(base_size = 14) +
  theme(axis.text.x = element_text(angle = 90))

which gives:

Used data:

df1 <- structure(list(gene = c("GPT", "TTYR", "GTT", "JUN"), counts = c(13L, 1L, 2L, NA), old_counts = c(12L, NA, 5L, 2L)), .Names = c("gene", "counts", "old_counts"), class = "data.frame", row.names = c(NA, -4L))

thanks. How do we set the y axes to 100 ? That is if all the values are in percentage and y axes to be labelled as 100 percent limit. — MAPK, Feb 13 '16 at 11:18
@MAPK add `theme(axis.text.x = element_text(angle = 90))` to your `ggplot`-code; I've updated the answer as well — Jaap, Feb 13 '16 at 14:08

score 2 · Answer 2 · edited Jun 20 '20 at 09:12

2

We could try

m1 <- `colnames<-`(t(df1[-1]), df1$gene)
b1 <- barplot(m1, beside=TRUE, legend=TRUE, col = c('blue', 'green'))
axis(1, at = b1+0.2, labels = 
 c('', 'new')[c(is.na(m1))+1L], pos= -0.8, lwd.ticks=0, lty=0)

If there are "missing" values in the "counts" column and we want to add "missing" below the "counts" bar in the plot

df1$counts[3] <- NA
m1 <- `colnames<-`(t(df1[-1]), df1$gene)
b1 <- barplot(m1, beside=TRUE, legend=TRUE, col = c('blue', 'green'))
i1 <- (is.na(m1))+1L
lbl <- c('', 'missing', 'new')[pmax((i1!=1)*row(i1) + 1L, i1)]
axis(1, at = b1+0.2, labels = lbl, pos= -0.8, lwd.ticks=0, lty=0)

data

df1 <- structure(list(gene = c("GPT", "TTYR", "GTT", 
 "JUN"), counts = c(13L, 
1L, 2L, 3L), old_counts = c(12L, NA, 5L, 2L)), 
.Names = c("gene", 
"counts", "old_counts"), class = "data.frame", 
row.names = c(NA, -4L))

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 13 '16 at 09:10

akrun

874,273
37
540
662

Thank you. How do I add `missing` (just like `new`), below `old_counts` bar, if there is no value in `counts`? – MAPK Feb 13 '16 at 10:10
Yes this is correct. I am curious to know how we can add `missing` if there is no value for counts but present in old_counts. – MAPK Feb 13 '16 at 10:18
For this data, for example, if I want to add `missing` for `JUN` bar: `df1 <- structure(list(gene = c("GPT", "TTYR", "GTT", "JUN"), counts = c(13L, 1L, 2L, NA), old_counts = c(12L, NA, 5L, 2L)), .Names = c("gene", "counts", "old_counts"), class = "data.frame", row.names = c(NA, -4L))` – MAPK Feb 13 '16 at 10:21
1

@MAPK Please check if the update helps. (just noticed that you provided a new dput in the comment.) Anyway, I added the missing one for the "GTT". It also works with the one you showed in the example with "JUN". – akrun Feb 13 '16 at 10:26

How to plot comparison of old data with new

2 Answers2

data