0

i need to remove specific rows containing specific column values in a pandas dataframe.

import numpy as np
import pandas as pd
import copy


### Choose the columns we will use
usecols = [str(x) for x in ["dered_r","u_g_color","g_r_color","r_i_color","i_z_color","diff_u",\
           "diff_g1","diff_i","diff_z","class"]]

### Load the data 
dataset = pd.read_csv('STAR_data.csv',index_col=0, usecols=usecols)
dataset = dataset.append(pd.read_csv("GALAXY_data.csv",index_col=0, usecols=usecols))
dataset = dataset.append(pd.read_csv("QSO_data.csv",index_col=0, usecols=usecols))

### Fix the data
dataset = dataset[dataset["dered_r"] > -9999] 
dataset = dataset[(dataset["g_r_color"] > -10)]
dataset = dataset[(dataset["g_r_color"] < 10)]

i want to remove any row which has a smaller value than -9999 under the dered_r column. For g_r_color i want only the rows which is between 10 and -10. The code i wrote gives the error :

Traceback (most recent call last):
 File "C:\Users\ABRA\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexes\base.py", line 2890, in get_loc
   return self._engine.get_loc(key)
 File "pandas\_libs\index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
 File "pandas\_libs\index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
 File "pandas\_libs\hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
 File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'dered_r'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
 File "data_extractor.py", line 19, in <module>
   dataset = dataset[dataset["dered_r"] > -9999]
 File "C:\Users\ABRA\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\frame.py", line 2975, in __getitem__
   indexer = self.columns.get_loc(key)
 File "C:\Users\ABRA\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexes\base.py", line 2892, in get_loc
   return self._engine.get_loc(self._maybe_cast_indexer(key))
 File "pandas\_libs\index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
 File "pandas\_libs\index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
 File "pandas\_libs\hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
 File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'dered_r'
Olca Orakcı
  • 372
  • 3
  • 12
  • 3
    It seems there's no `dered_r` column. Your code is fine. Could you post your `dataset.columns` output ? – Juan C Aug 07 '19 at 18:37
  • Have already checked this [post](https://stackoverflow.com/questions/51954781/pandas-dataframe-column-of-lists-remove-a-specific-value) or this other [post](https://stackoverflow.com/questions/18172851/deleting-dataframe-row-in-pandas-based-on-column-value)? They are similar questions. – MrCorote Aug 07 '19 at 18:38
  • Juan C, you were right. I think I made it index column by mistake. Now I fixed it and everything is working. Thanks a lot :) – Olca Orakcı Aug 07 '19 at 19:31

0 Answers0