1

I have excel file with data list like:

column 1 column 2 column 3
 1          2        2
            3        5
                     6

I want to have a list with arrays for each column.

    sampling = pd.read_excel(file_path, sheetname=0, index_row=1)
    print(sampling)
    array = []
    for i in range(0,3):
        array2 = sampling['column '+ str(i+1)].tolist()
        array.append(array2)

    print(array)        

   column 1  column 2  column 3
0       1.0       2.0         2
1       NaN       3.0         5
2       NaN       NaN         6
[[1.0, nan, nan], [2.0, 3.0, nan], [2, 5, 6]]

How to get only non Nan values? I want to have such result

[[1.0], [2.0, 3.0], [2, 5, 6]] 
Ilya M
  • 13
  • 3

3 Answers3

1

Create lists with missing values and then remove them:

L = [x.dropna().tolist() for _, x in df.items()]
print (L)
[[1.0], [2.0, 3.0], [2, 5, 6]]

Or:

L = [[y for y in x if pd.notna(y)] for _, x in df.items()]
print (L)
[[1.0], [2.0, 3.0], [2, 5, 6]]

If want use your solution only add Series.dropna for remove missing values:

array = []
for i in range(0,3):
    array2 = sampling['column '+ str(i+1)].dropna().tolist()
    array.append(array2)

print(array)        
[[1.0], [2.0, 3.0], [2, 5, 6]]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

pandas.DataFrame.dropna

(Ref: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html)

This will drop NaN values from a pandas DataTable object.

Edit: Alternatively, you can check out this question which shows a different method.

Sciborg
  • 416
  • 1
  • 4
  • 15
  • 1
    As far as I understand it is not what I wanted. I want to get all non Nan values so this mean something like this ```[[1.0], [2.0, 3.0], [2, 5, 6]]``` and using dropna I'll lose columns with data. – Ilya M Jan 12 '20 at 09:35
  • My bad, I didn't fully understand what you were looking for. It looks like @jezrael's answer does what you want. – Sciborg Jan 12 '20 at 09:38
0

Very simple, add one line:

filtered_sampling = sampling[sampling[['column '+ str(i+1)]].notnull().all(1)]

into your loop. Then use filtered_sampling.

To conclude:

array = []
for i in range(0,3):

    filtered_sampling = sampling[sampling[['column '+ str(i+1)]].notnull().all(1)]
    array2 = filtered_sampling['column '+ str(i+1)].tolist()
    array.append(array2)

print(array) 
isydmr
  • 649
  • 7
  • 16