I have a genome sequencing file in the following format:
chromosome name (string) | location (int) | readings (int)
Data for all chromosomes are stored in one single file and I wish to
- split file into individual chromosome data files;
- convert chromosome names e.g. 'chr1', 'x' to integers.
How can I do that with Pandas?
import pandas as pd
df = pd.read_csv('sample.txt', delimiter='\t', header=None)
The data look like this
0 chr1 3000573 0
1 chr1 3000574 3
2 chr2 3000725 1
3 chr2 3000726 4
4 chr3 3000900 1
5 chr3 3000901 0
I can also reindex the data frame by the chromosome labels chr1, chr2, ...