I have a data table of data for a variety of genomic positions. The positions are represented as 3-tuples ('chromosome', 'srand', position) that I've turned into a multi-index. My goal is to look up various information about each position and add that to the table (for example gene name, etc.) I can do this with pybedtools.
df = pd.DataFrame(data={'A':range(1,8), 'B':range(1,8), 'C': range(1,8)},
index=pd.MultiIndex.from_tuples([('chrom1', '-', 1234), ('chrom1', '+', 5678),
('chrom1', '+', 9876), ('chrom2', '+', 13579), ('chrom2', '+', 8497), ('chrom2', '-', 98765),
('chrom2', '-', 76856)]))
df.index.rename(['chrom','strand','abs_pos'], inplace=True)
A B C
chrom strand abs_pos
chrom1 - 1234 1 1 1
+ 5678 2 2 2
9876 3 3 3
chrom2 + 13579 4 4 4
8497 5 5 5
- 98765 6 6 6
76856 7 7 7
My issue is with adding columns to a data frame with a multi-index. This seems straight forward without a multi-index: pandas - add new column to dataframe from dictionary
I have a dictionary of the look up information with 3-tuple keys corresponding to the multi-index. How can I add this data as a new column?
gene_d = {('chrom1', '-', 1234) : 'geneA', ('chrom1', '+', 5678): 'geneB',
('chrom1', '+', 9876): 'geneC', ('chrom2', '+', 13579): 'geneD',
('chrom2', '+', 8497): 'geneE', ('chrom2', '-', 98765): 'geneF',
('chrom2', '-', 76856): 'geneG'}
I've tried map, but can't seem to figure out how to get it to work with a multi-index to yield the following:
A B C
chrom strand abs_pos gene
chrom1 - 1234 geneA 1 1 1
+ 5678 geneB 2 2 2
9876 geneC 3 3 3
chrom2 + 13579 geneD 4 4 4
8497 geneE 5 5 5
- 98765 geneF 6 6 6
76856 geneG 7 7 7