1

Continuing on my previous question link (things are explained there), I now have obtained an array. However, I don't know how to use this array, but that is a further question. The point of this question is, there are NaN values in the 63 x 2 column that I created and I want the rows with NaN values deleted so that I can use the data (once I ask another question on how to graph and export as x , y arrays)

Here's what I have. This code works.

import pandas as pd

df = pd.read_csv("~/Truncated raw data hcl.csv")

data1 = [df.iloc[:, [0, 1]]]

The sample of the .csv file is located in the link.

I tried inputting

data1.dropna()

but it didn't work.

I want the NaN values/rows to drop so that I'm left with a 28 x 2 array. (I am using the first column with actual values as an example).

Thank you.

Ilyankor
  • 171
  • 3
  • 8
  • 2
    Try `data1 = df.iloc[:, [0, 1]]` first and then `data1.dropna()`. You were putting the dataframe in the list by using extra `[ ]` – Sheldore Dec 25 '18 at 00:51

2 Answers2

0

Try

import pandas as pd

df = pd.read_csv("~/Truncated raw data hcl.csv")

data1 = df.iloc[:, [0, 1]]
cleaned_data = data1.dropna()

You were probably getting an Exception like "List does not have a method 'dropna'". That's because your data1 was not a Pandas DataFrame, but a List - and inside that list was a DataFrame.

0

However the answer is already given, Though i would like to put some thoughts across this.

Importing Your dataFrame taking the example dataset from your earlier post you provided:

>>> import pandas as pd
>>> df = pd.read_csv("so.csv")
>>> df
    time  1mnaoh trial 1  1mnaoh trial 2  1mnaoh trial 3       ...        5mnaoh trial 1  5mnaoh trial 2  5mnaoh trial 3  5mnaoh trial 4
0    0.0            23.2            23.1            23.1       ...                  23.3            24.3            24.1            24.1
1    0.5            23.2            23.1            23.1       ...                  23.4            24.3            24.1            24.1
2    1.0            23.2            23.1            23.1       ...                  23.5            24.3            24.1            24.1
3    1.5            23.2            23.1            23.1       ...                  23.6            24.3            24.1            24.1
4    2.0            23.3            23.2            23.2       ...                  23.7            24.5            24.7            25.1
5    2.5            24.0            23.5            23.5       ...                  23.8            27.2            26.7            28.1
6    3.0            25.4            24.4            24.1       ...                  23.9            31.4            29.8            31.3
7    3.5            26.9            25.5            25.1       ...                  23.9            35.1            33.2            34.4
8    4.0            27.8            26.5            26.2       ...                  24.0            37.7            35.9            36.8
9    4.5            28.5            27.3            27.0       ...                  24.0            39.7            38.0            38.7
10   5.0            28.9            27.9            27.7       ...                  24.0            40.9            39.6            40.2
11   5.5            29.2            28.2            28.3       ...                  24.0            41.9            40.7            41.0
12   6.0            29.4            28.5            28.6       ...                  24.1            42.5            41.6            41.2
13   6.5            29.5            28.8            28.9       ...                  24.1            43.1            42.3            41.7
14   7.0            29.6            29.0            29.1       ...                  24.1            43.4            42.8            42.3
15   7.5            29.7            29.2            29.2       ...                  24.0            43.7            43.1            42.9
16   8.0            29.8            29.3            29.3       ...                  24.2            43.8            43.3            43.3
17   8.5            29.8            29.4            29.4       ...                  27.0            43.9            43.5            43.6
18   9.0            29.9            29.5            29.5       ...                  30.8            44.0            43.6            43.8
19   9.5            29.9            29.6            29.5       ...                  33.9            44.0            43.7            44.0
20  10.0            30.0            29.7            29.6       ...                  36.2            44.0            43.7            44.1
21  10.5            30.0            29.7            29.6       ...                  37.9            44.0            43.8            44.2
22  11.0            30.0            29.7            29.6       ...                  39.3             NaN            43.8            44.3
23  11.5            30.0            29.8            29.7       ...                  40.2             NaN            43.8            44.3
24  12.0            30.0            29.8            29.7       ...                  40.9             NaN            43.9            44.3
25  12.5            30.1            29.8            29.7       ...                  41.4             NaN            43.9            44.3
26  13.0            30.1            29.8            29.8       ...                  41.8             NaN            43.9            44.4
27  13.5            30.1            29.9            29.8       ...                  42.0             NaN            43.9            44.4
28  14.0            30.1            29.9            29.8       ...                  42.1             NaN             NaN            44.4
29  14.5             NaN            29.9            29.8       ...                  42.3             NaN             NaN            44.4
30  15.0             NaN            29.9             NaN       ...                  42.4             NaN             NaN             NaN
31  15.5             NaN             NaN             NaN       ...                  42.4             NaN             NaN             NaN

However, It good to clean the data beforehand and then process the data as you desired hence dropping the NA values during import itself will be significantly useful.

>>> df = pd.read_csv("so.csv").dropna()    <-- dropping the NA here itself
>>> df
    time  1mnaoh trial 1  1mnaoh trial 2  1mnaoh trial 3       ...        5mnaoh trial 1  5mnaoh trial 2  5mnaoh trial 3  5mnaoh trial 4
0    0.0            23.2            23.1            23.1       ...                  23.3            24.3            24.1            24.1
1    0.5            23.2            23.1            23.1       ...                  23.4            24.3            24.1            24.1
2    1.0            23.2            23.1            23.1       ...                  23.5            24.3            24.1            24.1
3    1.5            23.2            23.1            23.1       ...                  23.6            24.3            24.1            24.1
4    2.0            23.3            23.2            23.2       ...                  23.7            24.5            24.7            25.1
5    2.5            24.0            23.5            23.5       ...                  23.8            27.2            26.7            28.1
6    3.0            25.4            24.4            24.1       ...                  23.9            31.4            29.8            31.3
7    3.5            26.9            25.5            25.1       ...                  23.9            35.1            33.2            34.4
8    4.0            27.8            26.5            26.2       ...                  24.0            37.7            35.9            36.8
9    4.5            28.5            27.3            27.0       ...                  24.0            39.7            38.0            38.7
10   5.0            28.9            27.9            27.7       ...                  24.0            40.9            39.6            40.2
11   5.5            29.2            28.2            28.3       ...                  24.0            41.9            40.7            41.0
12   6.0            29.4            28.5            28.6       ...                  24.1            42.5            41.6            41.2
13   6.5            29.5            28.8            28.9       ...                  24.1            43.1            42.3            41.7
14   7.0            29.6            29.0            29.1       ...                  24.1            43.4            42.8            42.3
15   7.5            29.7            29.2            29.2       ...                  24.0            43.7            43.1            42.9
16   8.0            29.8            29.3            29.3       ...                  24.2            43.8            43.3            43.3
17   8.5            29.8            29.4            29.4       ...                  27.0            43.9            43.5            43.6
18   9.0            29.9            29.5            29.5       ...                  30.8            44.0            43.6            43.8
19   9.5            29.9            29.6            29.5       ...                  33.9            44.0            43.7            44.0
20  10.0            30.0            29.7            29.6       ...                  36.2            44.0            43.7            44.1
21  10.5            30.0            29.7            29.6       ...                  37.9            44.0            43.8            44.2

and lastly cast your dataFrame as you wish:

>>> df = [df.iloc[:, [0, 1]]]
# new_df = [df.iloc[:, [0, 1]]]  <-- if you don't want to alter actual dataFrame
>>> df
[    time  1mnaoh trial 1
0    0.0            23.2
1    0.5            23.2
2    1.0            23.2
3    1.5            23.2
4    2.0            23.3
5    2.5            24.0
6    3.0            25.4
7    3.5            26.9
8    4.0            27.8
9    4.5            28.5
10   5.0            28.9
11   5.5            29.2
12   6.0            29.4
13   6.5            29.5
14   7.0            29.6
15   7.5            29.7
16   8.0            29.8
17   8.5            29.8
18   9.0            29.9
19   9.5            29.9
20  10.0            30.0
21  10.5            30.0]

Better Solution:

While looking at the end result, i see you are just concerning about the particular columns those are 'time' & '1mnaoh trial 1' hence idealistic would be to use usecole option which will reduce your memory footprint for the search across the data because you just opted the only columns which are useful for you and then use dropna() which will give you wanted you wanted i believe.

>>> df = pd.read_csv("so.csv", usecols=['time', '1mnaoh trial 1']).dropna()
>>> df
    time  1mnaoh trial 1
0    0.0            23.2
1    0.5            23.2
2    1.0            23.2
3    1.5            23.2
4    2.0            23.3
5    2.5            24.0
6    3.0            25.4
7    3.5            26.9
8    4.0            27.8
9    4.5            28.5
10   5.0            28.9
11   5.5            29.2
12   6.0            29.4
13   6.5            29.5
14   7.0            29.6
15   7.5            29.7
16   8.0            29.8
17   8.5            29.8
18   9.0            29.9
19   9.5            29.9
20  10.0            30.0
21  10.5            30.0
22  11.0            30.0
23  11.5            30.0
24  12.0            30.0
25  12.5            30.1
26  13.0            30.1
27  13.5            30.1
28  14.0            30.1
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53