0

I am very new in python programming. I couldn't find any way to write this part of my code. I would appreciate, if some one can help me.

I have a dataframe with 3 attributes(4000 records). Attribute x1,x2,class(Binary).

At first I made a scatter plot and realized that x1 range is between 3 to 13 and x2 range is between 3 to 8

I want to get data in some ranges: for example:

if 2.5< x1 < 3.5 and 3.5< x2 < 4.5 ---> df1

if 3.5 <=x1 < 4.5 and 4.5<=x2 < 5.5 ---> df2

if ....

Anshuman Dikhit
  • 459
  • 2
  • 10
dekaz
  • 23
  • 4
  • I think you can use [`groupby`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) where the groups are formed by [`pd.cut`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html) – ALollz Apr 08 '19 at 18:14
  • 3
    do you know how to use masks? `df1 = df[(df.x1 > 2.5) & (df.x1 < 3.5) & (df.x2 > 3.5) & (df.x2 < 4.5) ]` – Tarifazo Apr 08 '19 at 18:20

1 Answers1

1

As Mstaino points out, boolean masks are the correct strategy for accessing a range of values in one or more columns.

Since you are new to python programming (and consequently pandas), it is important to break this down into two steps.

First, there is creating a boolean mask and then second there is logical-and of the generated masks.

Here is an mcve that can be run and re-run to see how the sampling of values within dependent masked columns are obtained.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(200,3),columns=['x1', 'x2', 'Class'])
mask1 = (df.x1 > -.4) & (df.x1 < .6)
mask2 = (df.x2 > -.4) & (df.x2 < .5)

# What do the masks look like in context?
df['mask1'] = mask1
df['mask2'] = mask1
print(df.head())

# apply the boolean masks so ranges in mask1 and mask2 are obtained
df1 = df[mask1 & mask2]

# sample the result
print(df1.sample(n=4))
Rich Andrews
  • 1,590
  • 8
  • 12