I am trying to write a dictionary to a .mat file using scipy.io.savemat(), but when I do, the contents change!
Here is the array I wish to assign to the dictionary key "Genes":
vectorizeddf.index.values.astype(np.str_)
Which prints as
array(['44M2.3', 'A0A087WSV2', 'A0A087WT57', ..., 'tert-rmrp_human',
'tert-terc_human', 'wisp3 varinat'],
dtype='<U44')
Then I do
genedict = {"Genes": vectorizeddf.index.values.astype(np.str_),
"X": vectorizeddf.values,
"ID": vectorizeddf.columns.values.astype(np.str_)}
import scipy.io as sio
sio.savemat("goa_human.mat", genedict)
But when I load the dictionary using
goadict = sio.loadmat("goa_human.mat")
My strings get padded with spaces!
>>> goadict['Genes']
array(['44M2.3 ',
'A0A087WSV2 ',
'A0A087WT57 ', ...,
'tert-rmrp_human ',
'tert-terc_human ',
'wisp3 varinat '],
dtype='<U44')
Which is far from ideal. On the other hand, when I access
genedict['ID']
I get
array(['GO:0000002', 'GO:0000003', 'GO:0000009', ..., 'GO:2001303',
'GO:2001306', 'GO:2001311'],
dtype='<U10')
Which is the original format of the array before saving. It seems to me that the issue is in the dtype, but I did my best to cast both of them as strings. I am not sure why one is <U44
and the other is <U10
. How might I resolve this?
Thank you!