0

I am having an excel file of 8000 rows and 10 columns and date as index as shown below

                col1 col2 col3 col4 col5  col6  col7  col8   col9   col10
    Date                                                                
    1996-03-01  0.0  6.3  3.6  9.4  86.0  34.0  34.3  17.5   NaN     NaN
    1996-03-02  0.0  5.3  1.1  8.5  95.0  48.0  34.5  20.8   NaN     NaN
            ...  ...  ...  ...   ...   ...   ...   ...   ...     ...
    2015-12-30  0.0  3.6  NaN  8.4  92.0  25.0  32.6  16.4   NaN     NaN
    2015-12-31  0.0  3.4  NaN  8.6  92.0  41.0  31.7  17.4   NaN     NaN

In the above dataset there are missing values in col9 and col10.

Now my task is that where ever values are present in col 9 and col 10, I want to pick the entire row and save it in a matrix which i have done as shown below.

    col1 col2 col3 col4 col5 col6 col7 col8  col9   col10
    0.0  0.0  0.0  0.0  0.0  0.0  4.2  22.3  20.0  21.2  
    0.0  0.0  0.0  0.0  0.0  0.0  8.0  26.8  21.4  20.5 
    ...  ...  ...  ...   ...   ...   ...   ...   ...      
    0.0  0.0  6.0  3.5  8.9  9.8  7.5  25.6  22.4  27.6 

And the I other challenge is that I also want to keep the corresponding date also. How I do that?
Since date is a timestamp and rest all are float type. I can't insert the timestamp to a particular row column of a matrix.

MwamiTovi
  • 2,425
  • 17
  • 25
imran khan
  • 105
  • 5

1 Answers1

0

When you say matrix in python, I'm assuming you mean a df. If you don't and you mean an actual matrix of numpy arrays you can easily derive one from a df. Instead of copying your values from the df, just drop the required rows and store the remaining in a new df.

new_df = df.dropna(inplace=True,subset=["col9","col10"])  

Subset is the columns you want to consider in your drop na .(you can also pass column index/number in the list) ^ this will store your data, including the dates in a new df with the rows dropped.

To convert this new_df to numpy matrix, you can easily load it into a dictionary using:

DataFrame.to_dict(self, orient='dict', into=<class 'dict'>)[source] 

method and then from this, load your numpy mat (ref link : python dict to numpy structured array)

NOTE: if the dates are giving you extra trouble, then convert them to float in the new_df itself using :

def datetime_to_float(d):
    return d.timestamp()
Alper Kucukkomurler
  • 1,706
  • 2
  • 13
  • 19
Kriticoder
  • 103
  • 1
  • 6