Highest Voted 'gff' Questions

2

votes

3 answers

changing column values in tab delaminated file with awk without changing values in other columns

My file looks like this : 1-0039.1 EMBL transcript 1 1524 . + . transcript_id "1-0039.1.2"; gene_id "1-0039.1.2"; gene_name "dnaA" 1-0039.1 EMBL CDS 1 1524 . + 0 …

awk bioinformatics gff

asked Apr 06 '23 at 21:47

Maryam Ahmadi Jeshvaghane

23
3

2

votes

1 answer

Iterating through Pandas dataframe and dictionary items

here's a tough one. Problem Introduction: I'm working with two different files: a GFF3, which is basically a "9 columns" TSV, and a FASTA, which is a text file. Now, importing the GFF3 file with gffpandas it looks like this: …

pandas dataframe numpy fasta gff

asked Oct 13 '22 at 21:44

Iacopo Passeri

145
10

2

votes

4 answers

use sed to extract two pieces of text at once from a line

OK, I've found similar answers on SO but my sed / grep / awk fu is so poor that I couldn't quite adapt them to my task. Which is, given this file "test.gff": accn|CP014704 RefSeq CDS 403 915 . + 0 …

bash text sed grep gff

asked Sep 05 '16 at 00:20

Robot-Scott

99
8

1

vote

1 answer

Bash code with if/else/fi and awk/perl regexps does work outside snakemake but not inside a snakemake rule

I am constructing a snakemake workflow that is working well for the most part. One of the steps requires a ready.gtf file containing certain fields of information in its last column, specifically gene_name and transcript_name. I have thought of some…

regex awk snakemake gtfs gff

asked Mar 08 '23 at 12:28

apposada

13
2

1

vote

2 answers

How to print fields containing specific substrings with awk?

Goal: To print fields from an input file to an output file, but with special regard to specific fields split by semicolons (; — see field nine of example input below). Example Input (input.txt): NC_051336.1 Gnomon gene 40042 56215 . …

bash awk sed bioinformatics gff

asked Jun 23 '22 at 21:39

Gawain

188
1
16

1

vote

1 answer

Sed function in shell applied to all .gff files in a directory

I am working with .gff3 files trying to remove contig sequences in the bottom of many files in a directory. The contig sequences are separated from the rest of the file with a ##FASTA, and I wish to delete everything below (DNA sequences, FASTA…

shell sed bioinformatics gff

asked Mar 01 '21 at 07:53

Morten

49
4

1

vote

0 answers

Problem in merging a gff file and a csv file in R

I have a gff file and a csv file which looks like: # CSV dataframe file.csv <- read.table(text = "Sample Name Estimate Std.Err P.Adjust Sample_1 B005300.2.1 0.345930183 0.05662846 1.58E-06 Sample_1 B005230.2.1 0.048159129 0.013862871…

r csv bioinformatics bioconductor gff

asked Nov 18 '20 at 19:08

user1567654

47
5

1

vote

1 answer

Bcbio-gff File creation issue

When creating a file using GFF.write(), i get a new line with "annotation remark" as a source, followed by ASCII encoding of sequence regions: ##gff-version 3 ##sequence-region NC_011594.1 1 16779 NC_011594.1 annotation remark 1 16779 . . …

python bioinformatics biopython gff

asked Apr 23 '20 at 11:12

Felix Jaeger

13
4

1

vote

0 answers

How to convert output of Emboss:Palindrome into gff/bed file (perl)

I am sorry ton ask this kind of stupid question but I could not find it by myself... I learned perl a while ago and I am a little lost. I want to convert this kind of output : Palindromes of: seq1 Sequence length is: 24 Start at position: 1 End…

perl palindrome emboss gff

asked Nov 25 '19 at 10:42

Papaya

107
9

0

votes

0 answers

htseq-count does not generate read counts as expected

I have a .gff file which looks like below. caffold1 GeneWise mRNA 227302 283623 80.88 - . ID=Mnat_00001;evid_id=ENST00000360911;Shift=0; scaffold1 GeneWise CDS 227302 227498 . - 2 …

linux rna-seq gff

asked Mar 21 '23 at 12:59

Allan Okwaro

1
1

0

votes

1 answer

Handleing gff file from MISA

Replace whole column in BED file with motif length I was mining STR using MISA and I collected data from gff file to make a BED file including 5 column. Chromosome|Start|End|Motif length|Motif. But the 4th column showed Times of repeat example of my…

bash bioinformatics dna-sequence gff

asked Nov 16 '22 at 14:25

Bách Nguyễn

37
4

0

votes

1 answer

Biopython parsing over gff features to extract CDS

Hello I'm trying to extract the coding sequences from a fasta file using a gff file with the help of biopython (https://biopython.org/wiki/GFF_Parsing) I have tried doing what this tutorial describes but there is something I just don't seem to get…

python biopython fasta gff

asked May 16 '22 at 17:43

Robbe

35
8

0

votes

1 answer

Adding data to a dataframe based on groups

I'm working with bioinformatic data, with a gene in each row and statistics/metadata in the columns. Some genes are from the same organism which is indicated by column "ID", and I grouped the data on this variable. data <- data %>% group_by(ID) I…

r dplyr file-import gff

asked Mar 02 '21 at 12:33

Morten

49
4

0

votes

1 answer

Cell value to column name in pandas

I have the following pandas dataframe (it's a gff file): df = pd.DataFrame.from_dict({'scaffold name': {0: 'Tname16C00001.1', 1: 'Tname16C00001.1', 2: 'Tname16C00001.1', 3: 'Tname16C00001.1', 4: 'Tname16C00001.1', 5: 'Tname16C00001.1', …

python pandas dataframe gff

asked Mar 28 '20 at 09:51

Saraha

144
1
12

0

votes

0 answers

HTSeq-count is returning 0 for every gene, instead of expression value

I'm trying to summarize gene count using htseq-count; and it's returning 0 counts at every gene. I'm not sure what I'm doing wrong; I think it has to do with the gene flag I'm using. I've tried using the GTF for Arabidopsis TAIR 10: which I got…

bioinformatics gff

asked Mar 04 '20 at 02:14

sdshinghal

41
1
6

Questions tagged [gff]