Questions tagged [gff]

General feature format is a file format used for describing genes and other features of DNA, RNA and protein sequences.

General feature format

26 questions
2
votes
3 answers

changing column values in tab delaminated file with awk without changing values in other columns

My file looks like this : 1-0039.1 EMBL transcript 1 1524 . + . transcript_id "1-0039.1.2"; gene_id "1-0039.1.2"; gene_name "dnaA" 1-0039.1 EMBL CDS 1 1524 . + 0 …
2
votes
1 answer

Iterating through Pandas dataframe and dictionary items

here's a tough one. Problem Introduction: I'm working with two different files: a GFF3, which is basically a "9 columns" TSV, and a FASTA, which is a text file. Now, importing the GFF3 file with gffpandas it looks like this: …
2
votes
4 answers

use sed to extract two pieces of text at once from a line

OK, I've found similar answers on SO but my sed / grep / awk fu is so poor that I couldn't quite adapt them to my task. Which is, given this file "test.gff": accn|CP014704 RefSeq CDS 403 915 . + 0 …
1
vote
1 answer

Bash code with if/else/fi and awk/perl regexps does work outside snakemake but not inside a snakemake rule

I am constructing a snakemake workflow that is working well for the most part. One of the steps requires a ready.gtf file containing certain fields of information in its last column, specifically gene_name and transcript_name. I have thought of some…
apposada
  • 13
  • 2
1
vote
2 answers

How to print fields containing specific substrings with awk?

Goal: To print fields from an input file to an output file, but with special regard to specific fields split by semicolons (; — see field nine of example input below). Example Input (input.txt): NC_051336.1 Gnomon gene 40042 56215 . …
Gawain
  • 188
  • 1
  • 16
1
vote
1 answer

Sed function in shell applied to all .gff files in a directory

I am working with .gff3 files trying to remove contig sequences in the bottom of many files in a directory. The contig sequences are separated from the rest of the file with a ##FASTA, and I wish to delete everything below (DNA sequences, FASTA…
Morten
  • 49
  • 4
1
vote
0 answers

Problem in merging a gff file and a csv file in R

I have a gff file and a csv file which looks like: # CSV dataframe file.csv <- read.table(text = "Sample Name Estimate Std.Err P.Adjust Sample_1 B005300.2.1 0.345930183 0.05662846 1.58E-06 Sample_1 B005230.2.1 0.048159129 0.013862871…
1
vote
1 answer

Bcbio-gff File creation issue

When creating a file using GFF.write(), i get a new line with "annotation remark" as a source, followed by ASCII encoding of sequence regions: ##gff-version 3 ##sequence-region NC_011594.1 1 16779 NC_011594.1 annotation remark 1 16779 . . …
1
vote
0 answers

How to convert output of Emboss:Palindrome into gff/bed file (perl)

I am sorry ton ask this kind of stupid question but I could not find it by myself... I learned perl a while ago and I am a little lost. I want to convert this kind of output : Palindromes of: seq1 Sequence length is: 24 Start at position: 1 End…
Papaya
  • 107
  • 9
0
votes
0 answers

htseq-count does not generate read counts as expected

I have a .gff file which looks like below. caffold1 GeneWise mRNA 227302 283623 80.88 - . ID=Mnat_00001;evid_id=ENST00000360911;Shift=0; scaffold1 GeneWise CDS 227302 227498 . - 2 …
0
votes
1 answer

Handleing gff file from MISA

Replace whole column in BED file with motif length I was mining STR using MISA and I collected data from gff file to make a BED file including 5 column. Chromosome|Start|End|Motif length|Motif. But the 4th column showed Times of repeat example of my…
0
votes
1 answer

Biopython parsing over gff features to extract CDS

Hello I'm trying to extract the coding sequences from a fasta file using a gff file with the help of biopython (https://biopython.org/wiki/GFF_Parsing) I have tried doing what this tutorial describes but there is something I just don't seem to get…
Robbe
  • 35
  • 8
0
votes
1 answer

Adding data to a dataframe based on groups

I'm working with bioinformatic data, with a gene in each row and statistics/metadata in the columns. Some genes are from the same organism which is indicated by column "ID", and I grouped the data on this variable. data <- data %>% group_by(ID) I…
Morten
  • 49
  • 4
0
votes
1 answer

Cell value to column name in pandas

I have the following pandas dataframe (it's a gff file): df = pd.DataFrame.from_dict({'scaffold name': {0: 'Tname16C00001.1', 1: 'Tname16C00001.1', 2: 'Tname16C00001.1', 3: 'Tname16C00001.1', 4: 'Tname16C00001.1', 5: 'Tname16C00001.1', …
Saraha
  • 144
  • 1
  • 12
0
votes
0 answers

HTSeq-count is returning 0 for every gene, instead of expression value

I'm trying to summarize gene count using htseq-count; and it's returning 0 counts at every gene. I'm not sure what I'm doing wrong; I think it has to do with the gene flag I'm using. I've tried using the GTF for Arabidopsis TAIR 10: which I got…
sdshinghal
  • 41
  • 1
  • 6
1
2