4

So I have a data frame that's 50 columns and 400 rows consisting of all numbers. I'm trying to display only the columns that have values that fall outside a pre-defined range (i.e. only show values that aren't between -1 to +3).

So far I have:

df[(df.T > 3).all()]

to display values greater than 2 then I can change the integer to the other number of interest but how I can write something to display numbers that fall outside a range (i.e. display all columns that have values outside the range of -1 to +3).

e1v1s
  • 365
  • 6
  • 18
  • 1
    `df[~df.isin(range(-1,3))].dropna(axis=1)`. Avoid naming a column `T`, as `.T()` is a dataframe/series method. – Abdou Jan 12 '17 at 16:51

2 Answers2

4

you can use pd.DataFrame.mask

np.random.seed([3,1415])
df = pd.DataFrame(np.random.randint(-2, 4, (5, 3)), columns=list('abc'))
print(df)

   a  b  c
0 -2  1  0
1  1  0  0
2  3  1  3
3  0  1 -2
4  0 -2 -2

Mask makes cells that evaluate to True NaN

df.mask(df.ge(3) | df.le(-1))

     a    b    c
0  NaN  1.0  0.0
1  1.0  0.0  0.0
2  NaN  1.0  NaN
3  0.0  1.0  NaN
4  0.0  NaN  NaN

Or the opposite

df.mask(df.lt(3) & df.gt(-1))

     a    b    c
0 -2.0  NaN  NaN
1  NaN  NaN  NaN
2  3.0  NaN  3.0
3  NaN  NaN -2.0
4  NaN -2.0 -2.0
piRSquared
  • 285,575
  • 57
  • 475
  • 624
1

You could call stack to stack all columns so that you can use between to generate the mask on a range and then invert the mask using ~ and then call dropna(axis=1):

In [193]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df

Out[193]:
          a         b         c
0  0.088639  0.275458  0.837952
1  1.395237 -0.582110  0.614160
2 -1.114384 -2.774358  2.119473
3  1.050008 -1.195167 -0.343875
4 -0.006156 -2.028601 -0.071448

In [198]:
df[~df.stack().between(0.1,1).unstack()].dropna(axis=1)

Out[198]:
          a
0  0.088639
1  1.395237
2 -1.114384
3  1.050008
4 -0.006156

So here only column 'a' has values not between 0.1 and 1

prior to the dropna you can see that the other columns don't meet this criteria so they generate NaN:

In [199]:
df[~df.stack().between(0.1,1).unstack()]

Out[199]:
          a         b         c
0  0.088639       NaN       NaN
1  1.395237 -0.582110       NaN
2 -1.114384 -2.774358  2.119473
3  1.050008 -1.195167 -0.343875
4 -0.006156 -2.028601 -0.071448

By default the left and right values are included, if this isn't required then pass inclusive=False to between

EdChum
  • 376,765
  • 198
  • 813
  • 562
  • How do you get the `In [199]:` and `Out[199]:` elements as part of your output please? I asked this question here http://stackoverflow.com/questions/41520415/copy-data-from-jupyter-notebook – nipy Jan 12 '17 at 20:08
  • 1
    @ade1e OK I think I know the problem ever since some version of ipython, when copying and pasting it fails to select the in/out as part of the text selection, let me check my ipython/jupyter version – EdChum Jan 12 '17 at 20:11
  • 1
    @ade1e I'm running ipython `3.1.0-cbccb68` basically I tried to upgrade to a newer version and noticed the newer version prevented this behaviour so I stopped upgrading – EdChum Jan 12 '17 at 20:26