I've got a dataframe that contains a mostly NaN's, but also dictionaries in certain entries. My goal is expanding those dictionaries to columns of the dataframe and keeping their entries on their respective indices. This is what a small part of the dataframe looks like.
_id _score
query
chrM:g.146T>C NaN NaN
chrM:g.11723C>T NaN NaN NaN NaN
chrM:g.11813A>G NaN NaN
chrM:g.12140T>A NaN NaN
... ... ...
chr1:g.11976370T>G {u'ref': u'T', u'alleles': [{u'allele': u'T', ... NaN
chr1:g.12007164A>G NaN NaN
chr1:g.12007165A>G NaN NaN
So far, I've just managed to pick the keys of each dict and add columns named with those keys:
s ={}
for cols in cols:
if type(data1[cols].dropna()[0]) == type(s):
cols_var = np.array(data1[cols].dropna()[0].keys())
data1 = pandas.concat([data1,pandas.DataFrame(columns=cols_var)])
Any help or hints on how to do so in an efficient and readable will be much appreciated.
**EDIT: ** this code:
y = pandas.Series((dbsnp.iloc[0]))
print y
however retrieves something somewhat useful:
allele_origin unspecified
alleles [{u'allele': u'G'}, {u'allele': u'A'}]
alt A
... ...
rsid rs201327123
vartype snp
dtype: object
I'll try working from here, other input is much appreciated.