0

I am having below ct_data dataframe

imjp_number,imct_id
182467224,'ed3baabac3ce4d86801d8490ea474963|pXJjGxufodMVq5FBSzHc2A'
307291224,'__gde66a472fe104ab381456ee059751d9d|Qujk8BKa0XkkpJMCstCYBw'
214278175,'mbKKbkKpiTsIAyCE8y07rw|e8133ceeca654d169532b4ad4de661d5'
tes123456,'tMyM0un_ptsHHC-lET6tkQ|87538a4436af47a7a9b8b9bc2b3ec5ba'
Not Found,'pXJjGxufodMVq5FBSzHc2A'

I am applying below logic but it's not working.

ct_data['imjp_number']  = ct_data.loc[ct_data['imjp_number'].apply(lambda x: isinstance(x,int)), 'imjp_number']

Please suggest me best way to select ct_data df having integer value only and remove 'tes12345' and 'Not found' value from imjp_number columns

  • `ct_data[ct_data.imjp_number.str.isdigit()]` . [https://pandas.pydata.org/docs/user_guide/text.html#text-string-methods](https://pandas.pydata.org/docs/user_guide/text.html#text-string-methods) – wwii Feb 26 '21 at 03:10
  • Does [Remove non-numeric rows in one column with pandas](https://stackoverflow.com/questions/33961028/remove-non-numeric-rows-in-one-column-with-pandas) answer your question? Or [Filtering string/float/integer values in pandas dataframe columns](https://stackoverflow.com/questions/45338209/filtering-string-float-integer-values-in-pandas-dataframe-columns)? – wwii Feb 26 '21 at 03:15
  • @wwii your logic works but I have to remove entire row of tes12345 and not found along with imjp_number and imct_id so that dataframe column indexes should be equal – Dharmendra Yadav Feb 26 '21 at 03:23
  • `ct_data.imjp_number.str.isdigit()` creates a boolean Series. `ct_data[ct_data.imjp_number.str.isdigit()]` Will *return* a DataFrame without those two rows. [https://pandas.pydata.org/docs/user_guide/indexing.html#boolean-indexing](https://pandas.pydata.org/docs/user_guide/indexing.html#boolean-indexing) – wwii Feb 26 '21 at 03:26
  • @wwii It still contains tes123456 Not Found as rows and imct_id column is not coming after print(ct_data) – Dharmendra Yadav Feb 26 '21 at 03:29

1 Answers1

1
>>> print(df.to_string()) 
  imjp_number                                                     imct_id
0   182467224     ed3baabac3ce4d86801d8490ea474963|pXJjGxufodMVq5FBSzHc2A
1   307291224  __gde66a472fe104ab381456ee059751d9d|Qujk8BKa0XkkpJMCstCYBw
2   214278175     mbKKbkKpiTsIAyCE8y07rw|e8133ceeca654d169532b4ad4de661d5
3   tes123456     tMyM0un_ptsHHC-lET6tkQ|87538a4436af47a7a9b8b9bc2b3ec5ba
4   Not Found                                      pXJjGxufodMVq5FBSzHc2A

>>> print(df.imjp_number.str.isdigit().to_string())
0     True
1     True
2     True
3    False
4    False

>>> print(df[df.imjp_number.str.isdigit()].to_string())
  imjp_number                                                     imct_id
0   182467224     ed3baabac3ce4d86801d8490ea474963|pXJjGxufodMVq5FBSzHc2A
1   307291224  __gde66a472fe104ab381456ee059751d9d|Qujk8BKa0XkkpJMCstCYBw
2   214278175     mbKKbkKpiTsIAyCE8y07rw|e8133ceeca654d169532b4ad4de661d5
>>>

From the second question I linked to in the comment.

>>> print(df.to_string())
  imjp_number                                                     imct_id
0   182467224     ed3baabac3ce4d86801d8490ea474963|pXJjGxufodMVq5FBSzHc2A
1   307291224  __gde66a472fe104ab381456ee059751d9d|Qujk8BKa0XkkpJMCstCYBw
2   214278175     mbKKbkKpiTsIAyCE8y07rw|e8133ceeca654d169532b4ad4de661d5
3   tes123456     tMyM0un_ptsHHC-lET6tkQ|87538a4436af47a7a9b8b9bc2b3ec5ba
4   Not Found                                      pXJjGxufodMVq5FBSzHc2A
>>>
>>> print(pd.to_numeric(df.imjp_number, errors='coerce').to_string())
0    182467224.0
1    307291224.0
2    214278175.0
3            NaN
4            NaN
>>>
>>> print(pd.to_numeric(df.imjp_number, errors='coerce').notnull().to_string())
0     True
1     True
2     True
3    False
4    False
>>>
>>> print(df[pd.to_numeric(df.imjp_number, errors='coerce').notnull()].to_string())
  imjp_number                                                     imct_id
0   182467224     ed3baabac3ce4d86801d8490ea474963|pXJjGxufodMVq5FBSzHc2A
1   307291224  __gde66a472fe104ab381456ee059751d9d|Qujk8BKa0XkkpJMCstCYBw
2   214278175     mbKKbkKpiTsIAyCE8y07rw|e8133ceeca654d169532b4ad4de661d5
>>>
>>> df = df[pd.to_numeric(df.imjp_number, errors='coerce').notnull()]              
>>> print(df.to_string())                                                           
  imjp_number                                                     imct_id
0   182467224     ed3baabac3ce4d86801d8490ea474963|pXJjGxufodMVq5FBSzHc2A
1   307291224  __gde66a472fe104ab381456ee059751d9d|Qujk8BKa0XkkpJMCstCYBw
2   214278175     mbKKbkKpiTsIAyCE8y07rw|e8133ceeca654d169532b4ad4de661d5
>>>
wwii
  • 23,232
  • 7
  • 37
  • 77