1

For a bioinformatics project, I would like to read a .BED file into a pandas dataframe and have no clue how I can do it and what tools/programs are required. Nothing I found on the internet was really applicable to me, as I am working on windows10 with Python 3.7 (Anaconda distribution).

Any help would be appreciated.

Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
JazzyJazz
  • 75
  • 1
  • 2
  • 7

2 Answers2

1

According to https://software.broadinstitute.org/software/igv/BED:

A BED file (.bed) is a tab-delimited text file that defines a feature track.

According to http://genome.ucsc.edu/FAQ/FAQformat#format1 is contains up to 12 fields (columns) and possible comment lines starting with the word 'track'. The following is a minimal program to read such a bed file into a pandas dataframe.

import pandas as pd

df = pd.read_csv('so58178958.bed', sep='\t', comment='t', header=None)
header = ['chrom', 'chromStart', 'chromEnd', 'name', 'score', 'strand', 'thickStart', 'thickEnd', 'itemRgb', 'blockCount', 'blockSizes', 'blockStarts']
df.columns = header[:len(df.columns)]

This is just a very simple code snippet treating all lines starting with a 't' as comments. This should work as all 'chrom' field entries should start with either a 'c', an 's' or a digit.

Stef
  • 28,728
  • 2
  • 24
  • 52
1

If you use pyranges, the df will be given names and the columns appropriate data types.

import pyranges as pr

df = pr.read_bed("your.bed", as_df=True)

It also has readers for untidy bioinformatics formats such as gtfs and gff3s.

The Unfun Cat
  • 29,987
  • 31
  • 114
  • 156