0

I have the following data set gwas_data

Running head -n 23 gwas_data gives me the following table.

gwas_data <- 
  
  data.frame(
  stringsAsFactors = FALSE,
                 udi = c("A","B","C","D","E",
                         "F","G","H","I","J","K","A","B","C","D","E",
                         "F","G","H","I","J","K"),
                 snp = c("rs71628639_A",
                         "rs71628639_A","rs71628639_A","rs71628639_A","rs71628639_A",
                         "rs71628639_A","rs71628639_A","rs71628639_A",
                         "rs71628639_A","rs71628639_A","rs71628639_A","rs12726330_A",
                         "rs12726330_A","rs12726330_A","rs12726330_A",
                         "rs12726330_A","rs12726330_A","rs12726330_A","rs12726330_A",
                         "rs12726330_A","rs12726330_A","rs12726330_A"),
                 chr = c(1L,1L,1L,1L,1L,1L,1L,
                         1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
                         1L),
                  bp = c(154988255L,154988255L,
                         154988255L,154988255L,154988255L,154988255L,154988255L,
                         154988255L,154988255L,154988255L,154988255L,
                         155108167L,155108167L,155108167L,155108167L,155108167L,
                         155108167L,155108167L,155108167L,155108167L,
                         155108167L,155108167L),
                   p = c(0.580621191,0.356577427,
                         0.494774059,0.984005886,0.492034614,0.581479389,
                         0.24820214,0.202720896,0.295462221,0.845848783,
                         0.954714162,0.343101621,0.740942238,0.929127071,0.717965027,
                         0.335111376,0.857154424,0.480087195,0.980307843,
                         0.521114038,0.583150471,0.925783695),
                beta = c(0.000852277,0.003943912,
                         0.001091986,-3.18e-05,0.000564413,0.000120028,
                         0.026156467,0.000303135,0.069146449,-2.96e-07,-2.11e-05,
                         0.001274261,-0.001232397,0.000123948,-0.000498507,
                         -0.000689988,-3.41e-50,-0.013934416,5.12e-06,
                         -0.03696031,-7.28e-07,-3.01e-05),
              bp_cum = c(1.154988255,1.154988255,
                         1.154988255,1.154988255,1.154988255,1.154988255,
                         1.154988255,1.154988255,1.154988255,1.154988255,
                         1.154988255,1.155108167,1.155108167,1.155108167,
                         1.155108167,1.155108167,1.155108167,1.155108167,1.155108167,
                         1.155108167,1.155108167,1.155108167)
  )

I would like to make a manhattan plot, the X-axis should have chromosomal numbers from 1:22, I want each entry to be on the x-axis according to the BP position. The id should act as colour and the y-axis would be -log10(p).

I have rewritten the r command as follows, but my graph doesn't look correct.

library(plyr)
library(dplyr)
library(purrr)
library(tidyverse)
library(ggtext)
library(stringr)
gwas_data <- read.table("gwas_data", header=T)
sig <- 5e-8
manhplot <- ggplot(gwas_data, aes(x = bp_cum, y = -log10(p), color = udi)) +
   geom_hline(yintercept = -log10(sig), color = "grey40", linetype = "dashed") + 
   geom_point(aes(color=as.factor(udi)), alpha=0.8, size=2) +
   scale_x_continuous(label = axis_set$chr, breaks = axis_set$center) +
   scale_y_continuous(expand = c(0,0), limits = c(0, ylim)) +
   #scale_color_manual(values = rep(c("#276FBF", "#183059"), (length(axis_set$chr)))) +
   scale_size_continuous(range = c(0.5,3)) +
   theme_minimal()
print(manhplot)

I would also like to add the name of the ID and SNP if they are above the significant threshold.

My axis_set looks as follows with test data which goes from chromosome 1:4

chr center
1   179641307
2   354697451
3   553030055
4   558565909

My final graph looks as follows: GWAS

Susan Switzer
  • 1,531
  • 8
  • 34
zerberus
  • 73
  • 7
  • Why have you put `aes(x = bp_cum` if you want chromosome number on the x axis? – Allan Cameron Jun 06 '22 at 14:33
  • the bp_cum is chromosome number specific e.g. if the first entry is in chromosome 1 and the BP in 245, the bp_cum becomes 1.245, whereas if a an entry is in chromosome 2 and also bp is 245, the bp_cum becomes 2.245 so I can have the values on the x-axis within the domain of that specific chromosome. I am also considering actually developing a new column, so each instance of non unique instance of chr and bp get a their own count number e.g. chr 1 bp 245 would get bp_id 1, chr 1 bp 555 would get bp_id 2, chr 2 bp 245 would get bp_id 1 again, and then the final ids would be 1.1, 1.2, 2.1 etc. – zerberus Jun 06 '22 at 15:00
  • Whey you say the "graph doesn't look correct" what does that mean exactly? What do you expect the graph to look like in this case? What exactly are you trying to change? – MrFlick Jun 06 '22 at 15:00
  • Although the data above is only limited to the first 22 entries, the graph highlights in total 1101 entries ranging from chromosome 1 to 4. What I would like to happen is that each individual SNP is equally distributed, meaning the gaps are also equal on the x-axis, and that then the chromosome number is labelled. I am aware that some chromosomes will have more SNP's than others. – zerberus Jun 06 '22 at 15:05
  • 1
    @zerberus in that case, just change `bp_sum` to a factor and run your plot again. – Allan Cameron Jun 06 '22 at 15:19
  • It did work, is there a possibility to add a gap between the bp_sum ranges for chromosome 1 vs chromosome 2, although they are factored. I imagine subsetting would be an option, but then I would create 22 different graphs. – zerberus Jun 06 '22 at 15:36

0 Answers0