I have several large dataframes which are built up from a vehicle log. As only one message can be present on the CAN bus (vehicle communication protocol) at any time.
This is a simlipied dataframe without any interpolation:
time messageA1 messageA2 messageA3 messageB1 messageB2 message C1 messageC2
0 1 2 1 NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN 3 2
2 NaN NaN NaN 3 7 NaN NaN
And this can continue for millions of rows with NaN values consisting of about 95% of the entire dataframe. I have read that when a NaN/Null/None value is within a dataframe it is float64 value.
My questions:
- Is a float64 value allocated for every NaN value?
- If yes, does it do this memory efficiently?
- Will having a large dataframe, with 95% of it NaN values, be inefficient when it comes to process performance?