General feature format is a file format used for describing genes and other features of DNA, RNA and protein sequences.
Questions tagged [gff]
26 questions
2
votes
3 answers
changing column values in tab delaminated file with awk without changing values in other columns
My file looks like this :
1-0039.1 EMBL transcript 1 1524 . + . transcript_id "1-0039.1.2"; gene_id "1-0039.1.2"; gene_name "dnaA"
1-0039.1 EMBL CDS 1 1524 . + 0 …
2
votes
1 answer
Iterating through Pandas dataframe and dictionary items
here's a tough one.
Problem Introduction:
I'm working with two different files: a GFF3, which is basically a "9 columns" TSV, and a FASTA, which is a text file.
Now, importing the GFF3 file with gffpandas it looks like this:
…

Iacopo Passeri
- 145
- 10
2
votes
4 answers
use sed to extract two pieces of text at once from a line
OK, I've found similar answers on SO but my sed / grep / awk fu is so poor that I couldn't quite adapt them to my task. Which is, given this file "test.gff":
accn|CP014704 RefSeq CDS 403 915 . + 0 …

Robot-Scott
- 99
- 8
1
vote
1 answer
Bash code with if/else/fi and awk/perl regexps does work outside snakemake but not inside a snakemake rule
I am constructing a snakemake workflow that is working well for the most part. One of the steps requires a ready.gtf file containing certain fields of information in its last column, specifically gene_name and transcript_name.
I have thought of some…

apposada
- 13
- 2
1
vote
2 answers
How to print fields containing specific substrings with awk?
Goal: To print fields from an input file to an output file, but with special regard to specific fields split by semicolons (; — see field nine of example input below).
Example Input (input.txt):
NC_051336.1 Gnomon gene 40042 56215 . …

Gawain
- 188
- 1
- 16
1
vote
1 answer
Sed function in shell applied to all .gff files in a directory
I am working with .gff3 files trying to remove contig sequences in the bottom of many files in a directory. The contig sequences are separated from the rest of the file with a ##FASTA, and I wish to delete everything below (DNA sequences, FASTA…

Morten
- 49
- 4
1
vote
0 answers
Problem in merging a gff file and a csv file in R
I have a gff file and a csv file which looks like:
# CSV dataframe
file.csv <- read.table(text = "Sample Name Estimate Std.Err P.Adjust
Sample_1 B005300.2.1 0.345930183 0.05662846 1.58E-06
Sample_1 B005230.2.1 0.048159129 0.013862871…

user1567654
- 47
- 5
1
vote
1 answer
Bcbio-gff File creation issue
When creating a file using GFF.write(), i get a new line with "annotation remark" as a source, followed by ASCII encoding of sequence regions:
##gff-version 3
##sequence-region NC_011594.1 1 16779
NC_011594.1 annotation remark 1 16779 . . …

Felix Jaeger
- 13
- 4
1
vote
0 answers
How to convert output of Emboss:Palindrome into gff/bed file (perl)
I am sorry ton ask this kind of stupid question but I could not find it by myself... I learned perl a while ago and I am a little lost.
I want to convert this kind of output :
Palindromes of: seq1
Sequence length is: 24
Start at position: 1
End…

Papaya
- 107
- 9
0
votes
0 answers
htseq-count does not generate read counts as expected
I have a .gff file which looks like below.
caffold1 GeneWise mRNA 227302 283623 80.88 - . ID=Mnat_00001;evid_id=ENST00000360911;Shift=0;
scaffold1 GeneWise CDS 227302 227498 . - 2 …

Allan Okwaro
- 1
- 1
0
votes
1 answer
Handleing gff file from MISA
Replace whole column in BED file with motif length
I was mining STR using MISA and I collected data from gff file to make a BED file including 5 column. Chromosome|Start|End|Motif length|Motif. But the 4th column showed Times of repeat example of my…

Bách Nguyễn
- 37
- 4
0
votes
1 answer
Biopython parsing over gff features to extract CDS
Hello I'm trying to extract the coding sequences from a fasta file using a gff file with the help of biopython (https://biopython.org/wiki/GFF_Parsing)
I have tried doing what this tutorial describes but there is something I just don't seem to get…

Robbe
- 35
- 8
0
votes
1 answer
Adding data to a dataframe based on groups
I'm working with bioinformatic data, with a gene in each row and statistics/metadata in the columns. Some genes are from the same organism which is indicated by column "ID", and I grouped the data on this variable.
data <- data %>%
group_by(ID)
I…

Morten
- 49
- 4
0
votes
1 answer
Cell value to column name in pandas
I have the following pandas dataframe (it's a gff file):
df = pd.DataFrame.from_dict({'scaffold name': {0: 'Tname16C00001.1',
1: 'Tname16C00001.1',
2: 'Tname16C00001.1',
3: 'Tname16C00001.1',
4: 'Tname16C00001.1',
5: 'Tname16C00001.1',
…

Saraha
- 144
- 1
- 12
0
votes
0 answers
HTSeq-count is returning 0 for every gene, instead of expression value
I'm trying to summarize gene count using htseq-count; and it's returning 0 counts at every gene. I'm not sure what I'm doing wrong; I think it has to do with the gene flag I'm using.
I've tried using the GTF for Arabidopsis TAIR 10: which I got…

sdshinghal
- 41
- 1
- 6