Questions tagged [google-genomics]

Google Genomics provides an API for working with genomic data on Google's infrastructure.

Google Genomics provides an API for working with genomic data on Google's infrastructure. For more information see:

39 questions
2
votes
3 answers

Google Cloud/BigQuery/Genomics data location

Some of our company's work requires that data in the Cloud be stored in the US. For Google Cloud, I can specify bucket locations to US locations. https://cloud.google.com/storage/docs/bucket-locations But for BigQuery and Google Genomics, there's no…
1
vote
1 answer

How to run the latest Docker version of Nextclade on COVID input fasta?

I try to run the following Shell code: sudo docker run -it --rm nextstrain/nextclade:latest nextclade run --dataset-name 'sars-cov-2' - -output-all covid19.fasta # fasta file has the data I want to process Nothing could be processed, changed…
player777
  • 131
  • 4
1
vote
3 answers

using awk to print header name and a substring

i try using this code for printing a header of a gene name and then pulling a substring based on its location but it doesn't work >output_file cat input_file | while read row; do echo $row > temp geneName=`awk '{print $1}' tmp` …
Ziv Attia
  • 57
  • 7
1
vote
1 answer

TPM for TCGA and GTEX RNA

How do I convert TCGA RNA normalized_count the TPM values as calculated for GTEx. Right now the TPM values on GTEx are dramatically smaller than the values of TCGA. The tables that I am looking at are on BigQuery…
eilalan
  • 669
  • 4
  • 20
1
vote
1 answer

BigQuery with limit 10 or fewer extraction value returns correct results, changing limit or adding extraction return null

The following is an issue with genomic data: I use the following query on the pgp data in big query: http://googlegenomics.readthedocs.io/en/latest/use_cases/discover_public_data/pgp_public_data.html (used one sample id for…
1
vote
0 answers

Can't access silver-wall-555.TuteTable.hg19 table anymore

I was able to access silver-wall-555.TuteTable.hg19 on Friday (17th of February), but can't access it since 20th of February. Was it removed from Google BigQuery? Lot's os examples are based on silver-wall-555.TuteTable.hg19.
Tomas
  • 675
  • 1
  • 9
  • 18
1
vote
2 answers

Google Genomics jobs never ending

We were using Google Genomics ReadGroupSets to store our alignment data (BAM files) and it was running amazingly, until yesterday... Yesterday (08/29/2016) our import jobs (Method: readgroupsets.import) started the "running" status, but until now…
1
vote
3 answers

How to compress a list of files into a single gzip file using elasticluster, grid-engine-tools, and google cloud

I want to start by thanking you all for your help ahead of time, as this will help clear up a detail left out on the readthedocs.io guide. What I need is to compress several files into a single gzip, however, the guide shows only how to compress a…
Howard Davis
  • 312
  • 4
  • 10
1
vote
1 answer

is it possible to use the discovery module from the Google apiclient in Cloud Datalab?

I have a simple python script that does something like this: from apiclient import discovery from oauth2client.client import GoogleCredentials ggSvc = discovery.build ( 'genomics', 'v1', credentials=credentials ) body = { "readGroupSetIds":…
SheRey
  • 305
  • 1
  • 5
  • 15
1
vote
2 answers

Google Genomics API

Trying to use Google Genomics, following instructions found here: https://developers.google.com/genomics/ Trying to set up OAuth Client id (Section 4: Authenticate), from the GoogleCloud console, step c tells me: "On the APIs & auth tab, select APIs…
0
votes
0 answers

Run of Mutiple fastq files for fastqc analysis

I would run the following code for multiple fastq files in a folder. In a folder I have different fastq files; first I have to read one file and perform the required operations by activating the miniconda , then store results in a separate file.…
Luffy
  • 1
0
votes
1 answer

T-test comparing multiple columns to other columns

I am relatively new to R and need some help with my data analysis. In the attached table, Master Protein Accession column consists of a list of proteins that are increased or decreased in the cortex(C) under three conditions, i.e., control (C),…
0
votes
2 answers

gcloud beta lifesciences, JSON pipeline-file rather than options

I am trying to run gcloud beta lifesciences because genomics API is deprecated. There have been so many changes, genomics API vs lifesciences API. I ran one of my analysis step in google clooud using beta lifesciences. Here is what I found. (1)…
0
votes
2 answers

Additional 500 GB persistent disk attached by default

I am trying to run a workflow on GCP using Nextflow. The problem is, whenever an instance is created to run a process, it has two disks attached. The first boot-disk (default 10GB) and an additional 'google-pipelines-worker' disk (default 500GB).…
DUDANF
  • 2,618
  • 1
  • 12
  • 42
0
votes
1 answer

Google DeepVariant pipeline on GRCh37 WGS with exome model not finishing

I have an hg19-aligned BAM that I wish to generate a DeepVariant VCF for. I used samtools to extract the header and ensured that the hg19 reference FASTA index includes the same contigs and locations. My original goal was to run only an exome model…
1
2 3