1

So I have this code that I want to remove the lower boundary values from each column it is a database which consists of 15578089 rows x 18 columns, but I do not know how to apply it to the data frame, I tried df= df.drop(df.abs()<=LowerBoundary,axis=1) ,but it this seemed to drop every column, my code is: I will also upload images of the data columns and descriptionenter image description hereenter image description hereenter image description here

import pandas as pd
import random as r
import numpy as np


df = pd.read_csv('D:\\Project\\database\\3-Last\\LastCombineHalf.csv')

Q1 = df.quantile(0.75)
Q3 = df.quantile(0.25)
IQR = Q1 - Q3

#UpperBoundary = Q1 + (IQR * 1.5)
LowerBoundary = Q3 - (IQR * 1.5)
df.drop(df.abs()<=LowerBoundary,axis=0)
df.to_csv('D:\\Project\\database\\3-Last\\Removal.csv', index=False)

Thank you.

AliY
  • 557
  • 9
  • 27
  • you are applying the IQR across a lot of columns ? its hard to say without seeing your input data and expected output – Umar.H Jan 20 '20 at 00:53
  • Sorry, the data file is 2gb and I'm not sure if i can share it here, would it better if a create a dummy csv file? – AliY Jan 20 '20 at 00:56
  • yes you can replicate your issue with 5-10 rows quite easily, just show what you're trying to do.read : https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Umar.H Jan 20 '20 at 01:00
  • 1
    Does this answer your question? [How to Remove outlier from DataFrame using IQR?](https://stackoverflow.com/questions/50461349/how-to-remove-outlier-from-dataframe-using-iqr) – RubenB Jan 20 '20 at 02:34
  • It does look exactly like what I want, but I'm getting memory error even when i put `low_memory=True` – AliY Jan 20 '20 at 04:04

0 Answers0