11

I have several SNP IDs (i.e., rs16828074, rs17232800, etc...), I want to their coordinates in a Hg19 genome from UCSC genome website.

I would prefer using R to accomplish this goal. How to do that?

zx8754
  • 52,746
  • 12
  • 114
  • 209
user1938809
  • 1,135
  • 1
  • 9
  • 12
  • Can you provide more information? Are you working with FASTA files? Can't you simply run a multiple sequence alignment? – sreisman Nov 29 '13 at 20:43
  • I am not working with FASTA files but I can though. I believe there are many ways to accomplish this but I am asking the easiest way to do it. Thanks. – user1938809 Dec 04 '13 at 00:41
  • Any Python3 options for this? I found cruzdb but that seems to work only with Python2. – Krrr Sep 03 '21 at 11:35

3 Answers3

11

Here is a solution using the Bioconductor package biomaRt. It is a slightly corrected and reformatted version of the previously posted code.

library(biomaRt) # biomaRt_2.30.0

snp_mart = useMart("ENSEMBL_MART_SNP", dataset="hsapiens_snp")

snp_ids = c("rs16828074", "rs17232800")
snp_attributes = c("refsnp_id", "chr_name", "chrom_start")

snp_locations = getBM(attributes=snp_attributes, filters="snp_filter", 
                      values=snp_ids, mart=snp_mart)

snp_locations
#    refsnp_id chr_name chrom_start
# 1 rs16828074        2   232318754
# 2 rs17232800       18    66292259

Users are encouraged to read the comprehensive biomaRt vignette and experiment with the following biomaRt functions:

listFilters(snp_mart)
listAttributes(snp_mart)
attributePages(snp_mart)
listDatasets(snp_mart)
listMarts()
zx8754
  • 52,746
  • 12
  • 114
  • 209
bdemarest
  • 14,397
  • 3
  • 53
  • 56
  • May I ask filters="snp_filter" mean? Thanks. – user1938809 Dec 05 '13 at 04:45
  • From page 7 of the vignette: "Filters define a restriction on the query. For example you want to restrict the output to all genes located on the human X chromosome then the filter 'chromosome_name' can be used with value 'X'". With no filter, you would get all the snps in the chosen database. Try the command `listFilters(snp_mart)` to see what filters are possible. – bdemarest Dec 06 '13 at 02:09
  • I've noticed this takes a very very long time if you have a large number of SNPs, e.g. 200,000 ish (say with an Affymetrix 6.0 SNP array and wish positions for all of the SNPS). I wonder if anyone knows a faster way to do this. – Jim Bo Jan 29 '14 at 11:51
  • @JimBo, I have also noticed very slow response from biomaRt databases, even when the output is as few as 2000 lines. It may be worth posting on the bioconductor mailing list (http://www.bioconductor.org/help/mailing-list/mailform/). – bdemarest Jan 30 '14 at 07:08
4

Via Perl you will find it quite easy to build code to query for SNPs.

There is a web browser GUI tool (HERE) for building perl scripts based on which database and dataset you wish to query using Biomart library.

Instructions

  1. Go to http://www.ensembl.org/biomart/martview/ad23fb5685e6aecb59ab12ce73c89731 (for supported Metazoans), or http://biomart.vectorbase.org/biomart/martview/6e274bc00b3c68a131a6947d02039ade (for up to date Vectors of Malaria, e.g. A. gambiae)
  2. Select the database and dataset: enter image description here

  3. Click on the "perl" button to generate perl code for the Biomart API querying, and copy-paste the code into your perl editor - run it with the SNP rsNumbers of your choice.

# An example script demonstrating the use of BioMart API.
use strict;
use BioMart::Initializer;
use BioMart::Query;
use BioMart::QueryRunner;

my $confFile = "PATH TO YOUR REGISTRY FILE UNDER biomart-perl/conf/."
my $action='cached';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile,'action'=>$action);
my $registry = $initializer->getRegistry;

my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');
    $query->setDataset("hsapiens_snp");
    $query->addAttribute("refsnp_id");
    $query->addAttribute("refsnp_source");
    $query->addAttribute("chr_name");
    $query->addAttribute("chrom_start");
    $query->formatter("TSV");

my $query_runner = BioMart::QueryRunner->new();

############################## GET RESULTS ##########################
$query_runner->execute($query);
$query_runner->printHeader();
$query_runner->printResults();
$query_runner->printFooter();
#####################################################################
hello_there_andy
  • 2,039
  • 2
  • 21
  • 51
3

Using bioconductor's biomaRt R package.

This provides an easy way to send queries to BioMart which fetches information about SNPs given an rsNumber (i.e. rsid).

E.g. to import SNP data for rs16828074 (an rsNumber you listed in the post), use this:

Code:

library(biomaRt)

snp.id <- 'rs16828074'   # an SNP rsNumber like you listed in the post

snp.db <- useMart("snp", dataset="hsapiens_snp")  # select your SNP database

# The SNP data file imported from the HUMAN database:
nt.biomart <- getBM(c("refsnp_id","allele","chr_name","chrom_start",                   
                      "chrom_strand","associated_gene",
                      "ensembl_gene_stable_id"),
                      filters="refsnp",
                      values=snp.id,
                      mart=snp.db)

Let me know how you get on with this (via comments) since I assume some basic coding and package importing ability in my answer here.

Aknowledgement/s:

goes to Jorge Amigo (for his post in Biostars)

zx8754
  • 52,746
  • 12
  • 114
  • 209
hello_there_andy
  • 2,039
  • 2
  • 21
  • 51
  • 1
    I am getting the following error when I ran your code. Error in getBM(c("refsnp_id", "allele", "chr_name", "chrom_start", "chrom_strand", : Invalid filters(s): refsnp Please use the function 'listFilters' to get valid filter names – user1938809 Dec 03 '13 at 22:30
  • 1
    look at the post by @bdemarest, he seemed to have got there before me ^^, I will post another answer based on perl programming language, this is the most recommended language for SNP querying I believe – hello_there_andy Dec 05 '13 at 02:11
  • 1
    You meant to say thanks to "Jorge Amigo" not "Zach Stednick". And how is your post different from accepted answer by "bdemarest" ? – zx8754 Jan 25 '17 at 08:30
  • @zx8754 you are correct, on both counts, my apologies. I have edited correct aknowledgement. Why did I have the same answer to "bdemarest"? Originally I didn't realise the syntax was in R, only now do I realise this. Please flag my post as duplicate – hello_there_andy Jan 25 '17 at 20:56
  • 1
    Thanks, let it stay, I like the link to biostars :) – zx8754 Jan 25 '17 at 21:02