-2

how to remove cells in dataframe with no number in it in python ?

I am trying to remove cells from my DataFarme which contains only characters

enter image description here

I want to remove cells like Farnet and make it null.

actually I check some links such as this but it didn't answer my question as i want to manipulate cells

f.a
  • 101
  • 1
  • 5
  • 14

3 Answers3

2

I believe you need:

df = pd.DataFrame({0:['a','DT8510','AFT1',np.nan],
                   1:['a','DT8510','u','as1']})
print (df)
        0       1
0       a       a
1  DT8510  DT8510
2    AFT1       u
3     NaN     as1

import re

d = re.compile('\d')
df = df.applymap(lambda x: x if d.search(str(x)) else np.nan)
print (df)
        0       1
0     NaN     NaN
1  DT8510  DT8510
2    AFT1     NaN
3     NaN     as1

Another solution:

df = df.where(df.apply(lambda x: x.astype(str).str.contains('\d')))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • thanks alot,your solution did work, but can you please explain it more and also help me to remove the one which only has less than 4 number in it as well? – f.a Dec 06 '18 at 14:34
  • @f.a - so removed `'AFT1'` because only one number ? – jezrael Dec 06 '18 at 14:37
  • @f.a - then change `\d` to `\d{4}` – jezrael Dec 06 '18 at 14:40
  • Thanks a lot, one more question, actually I am trying to extract pattern like below : T1370 , a string which start with B,H,N,T,E and have four number afterwards and omit anything coming after or before it and extract as many can be extracted in one cell. – f.a Dec 06 '18 at 14:52
  • @f.a - use `'^[BHNTE]{1}\d{4}$'` – jezrael Dec 06 '18 at 15:00
  • thanks The case is that it'll remove the first cells UH1744 totaly but I would like to only remove U from this string beside I want to make E1780T8126 seprated not remove them totaly. – f.a Dec 06 '18 at 15:50
0

Similar solution to @jezrael

import pandas as pd
import numpy as np
df = pd.DataFrame(data={'A':['1','textonly'],'B':['textandnum2','2']})
for column in df.columns:
    df[column][~df[column].str.contains('([1-9])')] = np.nan
df
erncyp
  • 1,649
  • 21
  • 23
0

I think you can use regex for finding the cells which do not have number.

^([^0-9]*)$ expression will find all the cells without numbers.

df = df.replace(r'^([^0-9]*)$', np.nan, regex=True)

This will replace all the cells with numbers with NaN and then you can use dropna to remove cells.

df = df.dropna()

I hope this helps.

sufi
  • 157
  • 3
  • 10
  • 1
    seems the output of `df.replace` is not a `dataFrame` since non of below codes is working and I recive below `error AttributeError: 'NoneType' object has no attribute 'dropna'` and this is also not working `df.to_csv("s1.csv", sep=',')` – f.a Dec 06 '18 at 14:24
  • Now you can try it. It should work. I have updated the regex and removed inplace=True. – sufi Dec 06 '18 at 14:54