remove symbols from string

Question

I have a file like:

@HWI
ABCDE
+
@HWI7
EFSA
+
???=AF
GTEY@JF
GVTAWM

I want to keep only the strings ( so remove everything that contains a symbol )

I tried :

import numpy as np
arr = np.genfromtxt(f, dtype=str)

for line in np.nditer(arr):
    if np.core.defchararray.isupper(line) and not '@?=;?+' in line:
        print line

but it gives :

@HWI
ABCDE
@HWI7
EFSA
???=AF
GTEY@JF
GVTAWM

and I am expecting:

ABCDE
EFSA
GVTAWM

I want to use numpy for this and not commands like regex or similar.

Possible duplicate of [How to remove symbols from a string with Python?](http://stackoverflow.com/questions/875968/how-to-remove-symbols-from-a-string-with-python) — xandermonkey, Jul 19 '16 at 14:13

Essex · Accepted Answer · 2016-07-19T14:40:18.067

1

This is my solution :

import numpy as np

arr = np.genfromtxt('text.txt', dtype=str)

test = np.core.defchararray.isalpha(arr) #Create a mask : True = only str and False = not only str

print arr[test] #Use the mask on arr and it will print only good values

Don't use if with numpy ! You have indexing to do that ;)

I get :

['ABCDE' 'EFSA' 'GVTAWM']

edited Jul 19 '16 at 14:40

answered Jul 19 '16 at 14:33

Essex

6,042
11
67
139

score 0 · Answer 2 · edited May 23 '17 at 12:15

0

W/ numpy:

There is an isalpha() and isnumeric() function to numpy as well. They can be read about here.

Without numpy, you could try this regex:

re.sub(r'[^\w]', ' ', s)

where s is your string and [^\w] will match anything that's not alphanumeric or underscore

An example on Stackoverflow

edited May 23 '17 at 12:15

Community

1
1

answered Jul 19 '16 at 14:13

xandermonkey

4,054
2
31
53

I am using numpy.I don't want to use regex – George Jul 19 '16 at 14:14
1

Alright, you should specify the requirements in the question. – xandermonkey Jul 19 '16 at 14:15

remove symbols from string

2 Answers2