1

I have a file like:

@HWI
ABCDE
+
@HWI7
EFSA
+
???=AF
GTEY@JF
GVTAWM

I want to keep only the strings ( so remove everything that contains a symbol )

I tried :

import numpy as np
arr = np.genfromtxt(f, dtype=str)

for line in np.nditer(arr):
    if np.core.defchararray.isupper(line) and not '@?=;?+' in line:
        print line

but it gives :

@HWI
ABCDE
@HWI7
EFSA
???=AF
GTEY@JF
GVTAWM

and I am expecting:

ABCDE
EFSA
GVTAWM

I want to use numpy for this and not commands like regex or similar.

George
  • 5,808
  • 15
  • 83
  • 160

2 Answers2

1

This is my solution :

import numpy as np

arr = np.genfromtxt('text.txt', dtype=str)

test = np.core.defchararray.isalpha(arr) #Create a mask : True = only str and False = not only str

print arr[test] #Use the mask on arr and it will print only good values

Don't use if with numpy ! You have indexing to do that ;)

I get :

['ABCDE' 'EFSA' 'GVTAWM']
Essex
  • 6,042
  • 11
  • 67
  • 139
0

W/ numpy:

There is an isalpha() and isnumeric() function to numpy as well. They can be read about here.

Without numpy, you could try this regex:

re.sub(r'[^\w]', ' ', s)

where s is your string and [^\w] will match anything that's not alphanumeric or underscore

An example on Stackoverflow

Community
  • 1
  • 1
xandermonkey
  • 4,054
  • 2
  • 31
  • 53