2

I'm using PyVCF python package in order to parse vcf files. This is the command to read a vcf file:

vcf_reader = vcf.Reader(open('file.vcf', 'r'))

Before going downstream, I would like to check that vcf_reader has data, i.e, the file.vcf has records inside. How can I check it?

Sorry, I'm really really new to python and I'm just beginning and I've tried to look for it on google but without success.

Thank you.

cucurbit
  • 1,422
  • 1
  • 13
  • 32
  • 1
    `vcf.Reader` returns an iterator, so you can use stock approaches to checking if an iterator is empty (as seen in the question I've marked this a duplicate of). – FThompson Apr 24 '15 at 09:30
  • Thank you Vulcan, I'll try the solution proposed in that post. :) – cucurbit Apr 24 '15 at 09:38

1 Answers1

1

You could use os.stat to check the file size before you do anything:

from os import stat

if  stat("file.vcf").st_size != 0

Or using your iterator check for any line not starting with # and reset the file object to the start before calling Reader on it:

with open('file.vcf', 'r') as f:
     if any(not line.startswith("#") for line in f):
       f.seek(0)
       vcf_reader = vcf.Reader(f)

Tested without just metadata:

[<vcf.model._Record object at 0x7f2f95c68eb8>, <vcf.model._Record object at 0x7f2f95c7b9b0>, <vcf.model._Record object at 0x7f2f95c7ba58>, <vcf.model._Record object at 0x7f2f95c7bb38>, <vcf.model._Record object at 0x7f2f95c7bf60>]

And with gives no output.

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • I'm unfamiliar with the VCF file format, but this will only work in the case that no metadata is present in the file either. In other words, this approach will not work to determine whether the records themselves are present if at least a byte of any metadata exists. – FThompson Apr 24 '15 at 09:32
  • At a glance at the [file format specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf), it seems that metadata *does* exist, and thus that this answer is unfortunately incorrect, despite possibly being helpful. – FThompson Apr 24 '15 at 09:33
  • Thanks Padraic. There is some lines starting with '#' in the vcf format so as @Vulcan has mentioned this approach will not work. – cucurbit Apr 24 '15 at 09:33
  • @cucurbit, just check the first line or two if necessary and seek to the start of the file again, how many lines are there? – Padraic Cunningham Apr 24 '15 at 09:34
  • @PadraicCunningham Could you be slightly misunderstanding the question? A VCF file has metadata followed by a series of data records, if present. The question is how to check if there are any VCF data records in addition to the basic metadata. – FThompson Apr 24 '15 at 09:36
  • @Vulcan, if there are no records I imagine there will only be lines starting with ## so the logic is still the same – Padraic Cunningham Apr 24 '15 at 09:39