Questions tagged [hail]

Hail is an open-source, general-purpose, Python-based data analysis library with additional data types and methods for working with genomic data.

11 questions
2
votes
0 answers

java.io.IOException: Stream is closed! Error in HDInsight with ADLS Gen 2

I had originally posted this on the Microsoft Q&A system at this link. but it doesn't appear to be acknowledged or addressed and thought there might be better feedback here on SO. I am currently using Hail for the pyspark library to perform varying…
EagleByte
  • 143
  • 1
  • 8
2
votes
2 answers

spark-submit error: Invalid maximum heap size: -Xmx4g --jars, but enough of memory on the system

I am running a spark job: spark-submit --master spark://ai-grisnodedev1:7077 --verbose --conf spark.driver.port=40065 --driver-memory 4g --jars /opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/hail-all-spark.jar --conf…
Nikita Vlasenko
  • 4,004
  • 7
  • 47
  • 87
1
vote
0 answers

Unable to find the internal logging class: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class

I am trying to build spark cluster on DNAnexus platform. I tried creating spark context from jupyterlab notebook. import pyspark sc = pyspark.SparkContext() spark = pyspark.sql.SparkSession(sc) I get the following error stack trace. Py4JJavaError:…
1
vote
2 answers

Parse .bgen files using HAIL without loading data on a single node

I am trying to parse genomic data that is delivered in a .bgen format to a Spark DF using HAIL. The file is 150 GB large and it won't fit into a single node on my cluster. I am wondering whether there are streaming commands/ways to parse the data…
Sylvi0202
  • 901
  • 2
  • 9
  • 13
1
vote
1 answer

Combine multiple VCF files into one large VCF file

I have a list of VCF files from specific ethnicity such as American Indian, Chinese, European, etc Under each ethnicity, I have around 100+ files. Currently, I computed the VARIANT QC metrics such as call_rate, n_het etc for one file as shown…
The Great
  • 7,215
  • 7
  • 40
  • 128
0
votes
0 answers

How to install VariantSpark to Google Colab - TypeError: SparkBackend__init__() got an unexpected keyword argument 'gcs_requester_pays_project'

I am unable to import VariantSpark 0.5.2 into a Google Colab notebook running Python 3.9.16, with Hail version 0.2.112 and Apache Spark version 3.3.2. Here is the pip install: pip install variant-spark Looking in indexes: https://pypi.org/simple,…
0
votes
1 answer

Access different type of preset target location in Luigi

I have a luigi pipeline. There is a file where Google Cloud is set as a target location: https://github.com/macarthur-lab/hail-elasticsearch-pipelines/blob/d6e9dedbce929c04c294c54095663ba94a4de3f0/luigi_pipeline/lib/hail_tasks.py#L37 Now, there is…
Nikita Vlasenko
  • 4,004
  • 7
  • 47
  • 87
0
votes
1 answer

hail.utils.java.FatalError: IllegalStateException: unread block data

I am trying to run a basic script on spark cluster that takes in a file, converts it and outputs in different format. The spark cluster at the moment consists of 1 master and 1 slave both running on the same node. The full command is: nohup…
Nikita Vlasenko
  • 4,004
  • 7
  • 47
  • 87
0
votes
1 answer

Using ipython on a different linux account: command gets stuck

I installed miniconda3 on one linux account, then I created an environment py37, installed all the needed packages and was able to launch ipython from the second account and import the package I wanted to import: hail. For that I changed all of the…
Nikita Vlasenko
  • 4,004
  • 7
  • 47
  • 87
0
votes
2 answers

Problems for Hail0.2 working on Azure DataBrick

Hello? Anyone who can help for Hail 0.2 on Azure DataBrick? After pip install lots of problems came out.... can't find Java Package , import hail.plot , hl.init() According to…
Chevady Ju
  • 43
  • 9
-1
votes
1 answer

Run Luigi task that depends on another task

I have one task SeqrMTToESTask that depends on another one called SeqrVCFToMTTask. You can see the full code here: https://github.com/macarthur-lab/hail-elasticsearch-pipelines/blob/master/luigi_pipeline/seqr_loading.py Now, I ran the first task…
Nikita Vlasenko
  • 4,004
  • 7
  • 47
  • 87