Hail is an open-source, general-purpose, Python-based data analysis library with additional data types and methods for working with genomic data.
Questions tagged [hail]
11 questions
2
votes
0 answers
java.io.IOException: Stream is closed! Error in HDInsight with ADLS Gen 2
I had originally posted this on the Microsoft Q&A system at this link. but it doesn't appear to be acknowledged or addressed and thought there might be better feedback here on SO.
I am currently using Hail for the pyspark library to perform varying…

EagleByte
- 143
- 1
- 8
2
votes
2 answers
spark-submit error: Invalid maximum heap size: -Xmx4g --jars, but enough of memory on the system
I am running a spark job:
spark-submit --master spark://ai-grisnodedev1:7077 --verbose --conf spark.driver.port=40065 --driver-memory 4g
--jars /opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/hail-all-spark.jar
--conf…

Nikita Vlasenko
- 4,004
- 7
- 47
- 87
1
vote
0 answers
Unable to find the internal logging class: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
I am trying to build spark cluster on DNAnexus platform.
I tried creating spark context from jupyterlab notebook.
import pyspark
sc = pyspark.SparkContext()
spark = pyspark.sql.SparkSession(sc)
I get the following error stack trace.
Py4JJavaError:…

Abhishek Shakya
- 71
- 5
1
vote
2 answers
Parse .bgen files using HAIL without loading data on a single node
I am trying to parse genomic data that is delivered in a .bgen format to a Spark DF using HAIL. The file is 150 GB large and it won't fit into a single node on my cluster.
I am wondering whether there are streaming commands/ways to parse the data…

Sylvi0202
- 901
- 2
- 9
- 13
1
vote
1 answer
Combine multiple VCF files into one large VCF file
I have a list of VCF files from specific ethnicity such as American Indian, Chinese, European, etc
Under each ethnicity, I have around 100+ files.
Currently, I computed the VARIANT QC metrics such as call_rate, n_het etc for one file as shown…

The Great
- 7,215
- 7
- 40
- 128
0
votes
0 answers
How to install VariantSpark to Google Colab - TypeError: SparkBackend__init__() got an unexpected keyword argument 'gcs_requester_pays_project'
I am unable to import VariantSpark 0.5.2 into a Google Colab notebook running Python 3.9.16, with Hail version 0.2.112 and Apache Spark version 3.3.2.
Here is the pip install:
pip install variant-spark
Looking in indexes: https://pypi.org/simple,…
0
votes
1 answer
Access different type of preset target location in Luigi
I have a luigi pipeline. There is a file where Google Cloud is set as a target location:
https://github.com/macarthur-lab/hail-elasticsearch-pipelines/blob/d6e9dedbce929c04c294c54095663ba94a4de3f0/luigi_pipeline/lib/hail_tasks.py#L37
Now, there is…

Nikita Vlasenko
- 4,004
- 7
- 47
- 87
0
votes
1 answer
hail.utils.java.FatalError: IllegalStateException: unread block data
I am trying to run a basic script on spark cluster that takes in a file, converts it and outputs in different format. The spark cluster at the moment consists of 1 master and 1 slave both running on the same node. The full command is:
nohup…

Nikita Vlasenko
- 4,004
- 7
- 47
- 87
0
votes
1 answer
Using ipython on a different linux account: command gets stuck
I installed miniconda3 on one linux account, then I created an environment py37, installed all the needed packages and was able to launch ipython from the second account and import the package I wanted to import: hail. For that I changed all of the…

Nikita Vlasenko
- 4,004
- 7
- 47
- 87
0
votes
2 answers
Problems for Hail0.2 working on Azure DataBrick
Hello? Anyone who can help for Hail 0.2 on Azure DataBrick?
After pip install lots of problems came out....
can't find Java Package , import hail.plot , hl.init()
According to…

Chevady Ju
- 43
- 9
-1
votes
1 answer
Run Luigi task that depends on another task
I have one task SeqrMTToESTask that depends on another one called SeqrVCFToMTTask. You can see the full code here:
https://github.com/macarthur-lab/hail-elasticsearch-pipelines/blob/master/luigi_pipeline/seqr_loading.py
Now, I ran the first task…

Nikita Vlasenko
- 4,004
- 7
- 47
- 87