0

I have a DataFrame with these columns:

ID                    int64
Key                   int64
Reference            object
sKey                float64
sName               float64
fKey                 int64
cName                object
ints                  int32

I want to create a new DataFrame containing columns commonName and ints where ints is greater than 10, I am doing:

df_greater_10 = df[['commonName', df[df.ints >= 1997]]]

I see the problem lies with the expression df[df.ints >= 1997] as I'm returning a DataFrame - how can I just get the column of ints with values greater than 10?

NRKirby
  • 1,584
  • 4
  • 21
  • 38

2 Answers2

2

You can use one of many available indexers. I would recommend .ix, because it seems to be faster:

df_greater_10 = df.ix[df.ints >= 1997, ['commonName', 'ints']]

or if you need only ints column

df_greater_10 = df.ix[df.ints >= 1997, 'ints']

Demo:

In [123]: df = pd.DataFrame(np.random.randint(5, 15, (10, 3)), columns=list('abc'))

In [124]: df
Out[124]:
    a   b   c
0  13  11  14
1  14  10  13
2   7  11   6
3   7  13  12
4   9   9   6
5   7   7   7
6   5   7   8
7   5  11   5
8   9   7   9
9  11  13   7

In [125]: df_greater_10 = df.ix[df.c > 10, ['a','c']]

In [126]: df_greater_10
Out[126]:
    a   c
0  13  14
1  14  13
3   7  12

UPDATE: starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.

So use df.loc[...] or df.iloc[...] instead of deprecated df.ix[...]

Community
  • 1
  • 1
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
1

Not sure why you haven't tried df[df.ints >= 1997]['ints'] first (Maybe I am missing someting, your dataframe is very big?). Here's a demo of how it would work below

>>> pd.DataFrame({'ints': [1, 2, 3, 10, 11], 'other': ['a', 'b', 'c', 'y', 'z']})

', 'y', 'z']})
   ints other
0     1     a
1     2     b
2     3     c
3    10     y
4    11     z

>>> df[df.ints >= 10]
   ints other
3    10     y
4    11     z
>>> df[df.ints >= 10]['ints']
3    10
4    11

You can get same result with df['ints'][df['ints'] >= 10] too, which makes it more obvious you're only interested in the ints column.

bakkal
  • 54,350
  • 12
  • 131
  • 107
  • You're not missing anything, I'm a Python novice. I tried `df_greater_10 = df[['commonName' df[df.ints >= 1997]['ints']]]` But I'm getting `ValueError: setting an array element with a sequence` – NRKirby Jun 11 '16 at 08:46