htseq-count does not generate read counts as expected

Question

I have a .gff file which looks like below.

caffold1       GeneWise        mRNA    227302  283623  80.88   -       .       ID=Mnat_00001;evid_id=ENST00000360911;Shift=0;
scaffold1       GeneWise        CDS     227302  227498  .       -       2       Parent=Mnat_00001;
scaffold1       GeneWise        CDS     230150  230298  .       -       1       Parent=Mnat_00001;
scaffold1       GeneWise        CDS     233743  234426  .       -       1       Parent=Mnat_00001;
scaffold1       GeneWise        CDS     236092  236835  .       -       1       Parent=Mnat_00001;
scaffold1       GeneWise        CDS     238558  238807  .       -       2       Parent=Mnat_00001;
scaffold1       GeneWise        CDS     240781  240970  .       -       0       Parent=Mnat_00001;
scaffold1       GeneWise        CDS     241779  241912  .       -       2       Parent=Mnat_00001;
scaffold1       GeneWise        CDS     249825  250005  .       -       0       Parent=Mnat_00001;
scaffold1       GeneWise        CDS     273368  273452  .       -       1       Parent=Mnat_00001;
scaffold1       GeneWise        CDS     283460  283623  .       -       0       Parent=Mnat_00001;
scaffold1       Cuff    mRNA    316222  341723  1000    +       .       ID=Mnat_00002;evid_id=CCG000108.1;source_id=CUFF.1168.1;
scaffold1       Cuff    CDS     316222  316368  1000    +       0       Parent=Mnat_00002;
scaffold1       Cuff    CDS     322468  322630  1000    +       0       Parent=Mnat_00002;
scaffold1       Cuff    CDS     325120  325274  1000    +       2       Parent=Mnat_00002;
scaffold1       Cuff    CDS     329797  329922  1000    +       0       Parent=Mnat_00002;

I also have .bam files from RNAseq data which I would want to generate read counts from.

My code is as below:

for i in *sorted.bam; do
   htseq-count $i Mnat_gene_v1.2.gff -f bam -i -mRNA -t CDS -m union -r name --stranded=no > ${i}_count.txt;
done

However, the bash command does not seem to work. The error that I get is:

Error processing GFF file (line 2 of file Mnat_gene_v1.2.gff):
  Feature Mnat_00001 does not contain a 'mRNA' attribute
  [Exception type: ValueError, raised in features.py:387]

Has anyone come across such an error?

You might need to further edit your sample data above. For example, Line 1 looks suspect to me with `- . ID=Mnat_00001;evid_id=ENST00000360911;Shift=0;`. Should some of that be on a separate line? Use the `{}` tool from the Edit menu on mouse-selected text to achieve correct formatting like `code/data/output/errMsgs`. Good luck. — shellter, Mar 21 '23 at 16:31
Otherwise, the error message `Feature Mnat_00001 does not contain a 'mRNA' attribute` seems pretty clear. The only question is where is the `Mnat_00001` "record"? Good luck. — shellter, Mar 21 '23 at 16:32
dear @shellter thanks for your comment. I will have to check again on the original .gff file. The file seems not properly formatted. Maybe I can start from there. The error, I still do not understand how it works. Because there is only one row with the mRNA attribute while the other rows for each gene have CDS in column three. But I will have a second look. — Allan Okwaro, Mar 23 '23 at 10:56
Arg. I should have concluded with "The only question is ***which*** Mnat_0001 record do we pick"?! Still, the fomat of data is confusing. Is there a OpenSource project someplace that has a data file that does work for you (just to test your workflow)? Good luck. — shellter, Mar 23 '23 at 16:07

htseq-count does not generate read counts as expected

0 Answers0