How to filter a Pandas series?

Question

I want to filter a Pandas Series to remove certain values. This seems like such a simple task, but the preferred answer to the same question doesn't work for me.

Here's my reproducible example:

data = np.array([['','Col1','Col2'],
                ['Row1',1,2],
                ['Row2',3,4]])

myDF = pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:])

mySeries = myDF.loc[:, "Col1"]
mySeries[mySeries != 1]

I expect the final line to output a single row, containing the value 3, but instead I get:

Row1    1
Row2    3
Name: Col1, dtype: object

What am I doing wrong?

score 3 · Answer 1 · answered Oct 31 '18 at 13:06

Your Series contains strings.

>>> mySeries.tolist()
>>> ['1', '3']

You can use

>>> mySeries[mySeries != '1']
>>> 
Row2    3
Name: Col1, dtype: object

This happens because numpy arrays hold a single data type, thus the integers are casted to strings when you create data.

If you want the integers, you can use

>>> mySeries = mySeries.astype(int)
>>> mySeries
>>> 
Row1    1
Row2    3
Name: Col1, dtype: int64

and your original code will work just fine.

score 2 · Answer 2 · answered Oct 31 '18 at 13:06

2

mySeries = mySeries.astype(int)
mySeries.loc[mySeries != 1]

answered Oct 31 '18 at 13:06

Naga kiran

4,528
1
17
31

score 2 · Accepted Answer · answered Oct 31 '18 at 13:18

Consider the dtype of the NumPy array you are creating:

data = np.array([['','Col1','Col2'],
                 ['Row1',1,2],
                 ['Row2',3,4]])

print(data)

array([['', 'Col1', 'Col2'],
       ['Row1', '1', '2'],
       ['Row2', '3', '4']], 
      dtype='<U4')

Combining strings and integers in a nested list before feeding to np.array creates an array of strings, evidenced by '<U4', which represents the maximum number of characters.

If you use lists instead, you won't meet this problem as the implementation ensures an array is created with appropriate types:

data = [['','Col1','Col2'],
        ['Row1',1,2],
        ['Row2',3,4]]

myDF = pd.DataFrame(data=[i[1:] for i in data[1:]],
                    index=[i[0] for i in data[1:]],
                    columns=data[0][1:])

score 1 · Answer 4 · answered Oct 31 '18 at 13:06

1

mySeries = pd.to_numeric(mySeries)

..that will fix it

answered Oct 31 '18 at 13:06

cardamom

6,873
11
48
102

How to filter a Pandas series?

4 Answers4