I know how to read binary files in Python using NumPy's np.fromfile()
function. The issue I'm faced with is that when I do so, the array has exceedingly large numbers of the order of 10^100 or so, with random nan
and inf
values.
I need to apply machine learning algorithms to this dataset and I cannot work with this data. I cannot normalise the dataset because of the nan
values.
I've tried np.nan_to_num()
but that doesn't seem to work. After doing so, my min and max values range from 3e-38 and 3e+38 respectively, so I could not normalize it.
Is there any way to scale this data down? If not, how should I deal with this?
Thank you.
EDIT:
Some context. I'm working on a malware classification problem. My dataset consists of live malware binaries. They are files of the type .exe, .apk etc. My idea is store these binaries as a numpy array, convert to a grayscale image and then perform pattern analysis on it.