Get data from column that are not nan values

Question

I have excel file with data list like:

column 1 column 2 column 3
 1          2        2
            3        5
                     6

I want to have a list with arrays for each column.

    sampling = pd.read_excel(file_path, sheetname=0, index_row=1)
    print(sampling)
    array = []
    for i in range(0,3):
        array2 = sampling['column '+ str(i+1)].tolist()
        array.append(array2)

    print(array)        

   column 1  column 2  column 3
0       1.0       2.0         2
1       NaN       3.0         5
2       NaN       NaN         6
[[1.0, nan, nan], [2.0, 3.0, nan], [2, 5, 6]]

How to get only non Nan values? I want to have such result

[[1.0], [2.0, 3.0], [2, 5, 6]]

What's your expected output? – Grzegorz Skibinski Jan 12 '20 at 09:30 — Grzegorz Skibinski, Jan 12 '20 at 09:30

jezrael · Accepted Answer · 2020-01-12T09:39:30.237

Create lists with missing values and then remove them:

L = [x.dropna().tolist() for _, x in df.items()]
print (L)
[[1.0], [2.0, 3.0], [2, 5, 6]]

Or:

L = [[y for y in x if pd.notna(y)] for _, x in df.items()]
print (L)
[[1.0], [2.0, 3.0], [2, 5, 6]]

If want use your solution only add Series.dropna for remove missing values:

array = []
for i in range(0,3):
    array2 = sampling['column '+ str(i+1)].dropna().tolist()
    array.append(array2)

print(array)        
[[1.0], [2.0, 3.0], [2, 5, 6]]

score 0 · Answer 2 · answered Jan 12 '20 at 09:30

0

pandas.DataFrame.dropna

(Ref: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html)

This will drop NaN values from a pandas DataTable object.

Edit: Alternatively, you can check out this question which shows a different method.

answered Jan 12 '20 at 09:30

Sciborg

416
1
4
15

1

As far as I understand it is not what I wanted. I want to get all non Nan values so this mean something like this ```[[1.0], [2.0, 3.0], [2, 5, 6]]``` and using dropna I'll lose columns with data. – Ilya M Jan 12 '20 at 09:35
My bad, I didn't fully understand what you were looking for. It looks like @jezrael's answer does what you want. – Sciborg Jan 12 '20 at 09:38

score 0 · Answer 3 · answered Jan 12 '20 at 09:37

Very simple, add one line:

filtered_sampling = sampling[sampling[['column '+ str(i+1)]].notnull().all(1)]

into your loop. Then use filtered_sampling.

To conclude:

array = []
for i in range(0,3):

    filtered_sampling = sampling[sampling[['column '+ str(i+1)]].notnull().all(1)]
    array2 = filtered_sampling['column '+ str(i+1)].tolist()
    array.append(array2)

print(array)

Get data from column that are not nan values

3 Answers3