Reading BED files into pandas dataframe (windows)

Question

For a bioinformatics project, I would like to read a .BED file into a pandas dataframe and have no clue how I can do it and what tools/programs are required. Nothing I found on the internet was really applicable to me, as I am working on windows10 with Python 3.7 (Anaconda distribution).

Any help would be appreciated.

Stef · Accepted Answer · 2019-10-01T07:14:57.293

According to https://software.broadinstitute.org/software/igv/BED:

A BED file (.bed) is a tab-delimited text file that defines a feature track.

According to http://genome.ucsc.edu/FAQ/FAQformat#format1 is contains up to 12 fields (columns) and possible comment lines starting with the word 'track'. The following is a minimal program to read such a bed file into a pandas dataframe.

import pandas as pd

df = pd.read_csv('so58178958.bed', sep='\t', comment='t', header=None)
header = ['chrom', 'chromStart', 'chromEnd', 'name', 'score', 'strand', 'thickStart', 'thickEnd', 'itemRgb', 'blockCount', 'blockSizes', 'blockStarts']
df.columns = header[:len(df.columns)]

This is just a very simple code snippet treating all lines starting with a 't' as comments. This should work as all 'chrom' field entries should start with either a 'c', an 's' or a digit.

score 1 · Answer 2 · answered Apr 22 '20 at 08:22

1

If you use pyranges, the df will be given names and the columns appropriate data types.

import pyranges as pr

df = pr.read_bed("your.bed", as_df=True)

It also has readers for untidy bioinformatics formats such as gtfs and gff3s.

answered Apr 22 '20 at 08:22

The Unfun Cat

29,987
31
114
156

Reading BED files into pandas dataframe (windows)

2 Answers2