1

I am trying to split the multiallelic sites of my VCF. I used bcftools norm --m-any. However, the result is not really reasonable to me. Here's an example.

Let's say, I have this multiallelic site:

REF     ALT     GT1     GT2     GT3
A       C,G     1/2     0/2     0/1

After splitting I get these two:

REF     ALT     GT1     GT2     GT3
A       C       1/0     0/0     0/1
A       G       0/1     0/1     0/0

So, the results for the "unused" ALT allele for a specific row is just set to REF. Is there a way to change this behavior, since I don't think it's reasonable to do it this way, at least for my analysis. I would like my result to be more like this:

REF     ALT     GT1     GT2     GT3          GT1     GT2     GT3
A       C       1/.     0/.     0/1    or    ./.     ./.     0/1
A       G       ./1     0/1     0/.          ./.     0/1     ./.

Or similar. At least I don't want to have REF where there was an ALT before.

gernophil
  • 177
  • 2
  • 6

1 Answers1

1

Have you try bcftools norm -a . ?

You can also check the --atom-overlaps option: 'Alleles missing because of an overlapping variant can be set either to missing (.) or to the star alele (*), as recommended by the VCF specification.'

ekerde
  • 46
  • 3
  • Yes, this option was implemented quite recently due to our request on GitHub. Haven't tried it yet, since it wasn't in the latest release available via conda. I will try that, but I solved this problem differently by using `vt decompose -s`. – gernophil Oct 26 '22 at 11:20
  • is there a mention of bcftool norm in any publication? – Death Metal Aug 31 '23 at 17:24