Delete string between two symbols, if both symbols appear in the string

Question

I want to delete a substring between a '+' and a '@' symbol together with the '+, if the '+' exists.

d = {'1' : 'dsjlskdgj+fdfsd@test.com', '2' : 'qwioept@test.com', '3' : 'dccnvmxcv+fas@test.com', '4':'dqlt@test.com'}

test_frame = pd.Series(d)

test_frame
Out[6]: 
1    dsjlskdgj+fdfsd@test.com
2            qwioept@test.com
3      dccnvmxcv+fas@test.com
4               dqlt@test.com
dtype: object

So, the result should be:

s = {'1' : 'dsjlskdgj@test.com', '2' : 'qwioept@test.com', '3' : 'dccnvmxcv@test.com', '4':'dqlt@test.com'}

test_frame_result = pd.Series(s)

test_frame_result
Out[10]: 
1    dsjlskdgj@test.com
2      qwioept@test.com
3    dccnvmxcv@test.com
4         dqlt@test.com
dtype: object

I tried it with split, but due to the fact that only some lines contain a +, it fails.

Is there an elegant solution without looping through all the lines (in the original dataset there are quite many).

Thanks!

If you don't "loop through all the lines" how can you process all of them? — user202729, Feb 06 '18 at 15:24
Does [this](https://stackoverflow.com/questions/4444477/how-to-tell-if-a-string-contains-a-certain-character-in-javascript) solve your problem "only some lines contain a +"? — user202729, Feb 06 '18 at 15:24
Ad first comment: if I only wanted the first 5 letters I could do that without looping through: test_frame_result.str[:5] — maxtenzin, Feb 06 '18 at 15:27
What about [this](https://stackoverflow.com/questions/26577516/pandas-test-if-string-contains-one-of-the-substrings-in-a-list)? Also implicitly the slice operator is (most likely) implemented using loops. Just that a loop in C is (often) faster than a loop in a higher level language. — user202729, Feb 06 '18 at 15:28

score 1 · Accepted Answer · answered Feb 06 '18 at 15:59

Is this sufficient?

import pandas as pd
d = {'1' : 'dsjlskdgj+fdfsd@test.com', 
         '2' : 'qwioept@test.com', 
         '3' : 'dccnvmxcv+fas@test.com', 
         '4':'dqlt@test.com'}

test_frame = pd.Series(d)
test_frame
print test_frame

found = test_frame[test_frame.str.contains(r'\+')]
test_frame[found.index] = found.str.replace(r'\+[^@]*', "")
print test_frame

Output:

(Before)

1    dsjlskdgj+fdfsd@test.com
2            qwioept@test.com
3      dccnvmxcv+fas@test.com
4               dqlt@test.com
dtype: object

(After)

1    dsjlskdgj@test.com
2      qwioept@test.com
3    dccnvmxcv@test.com
4         dqlt@test.com
dtype: object

glad it was helpful – Dmitry Duplyakin Feb 06 '18 at 16:02 — Dmitry Duplyakin, Feb 06 '18 at 16:02

score 0 · Answer 2 · answered Feb 06 '18 at 15:59

Found a solution - probably not the most elegant though:

import pandas as pd

test_frame = pd.DataFrame({'email':['dsjlskdgj+fdfsd@test.com','qwioept@test.com','dccnvmxcv+fas@test.com','dqlt@test.com']})

test_frame
Out[22]: 
                      email
0  dsjlskdgj+fdfsd@test.com
1          qwioept@test.com
2    dccnvmxcv+fas@test.com
3             dqlt@test.com

test_frame.loc[test_frame.email.str.contains('\+'),'email'] = test_frame[test_frame.email.str.contains('\+')].email.str.partition('+')[0] + '@' + test_frame[test_frame.email.str.contains('\+')].email.str.partition('+')[2].str.partition('@')[2]

test_frame
Out[24]: 
                email
0  dsjlskdgj@test.com
1    qwioept@test.com
2  dccnvmxcv@test.com
3       dqlt@test.com

Delete string between two symbols, if both symbols appear in the string

2 Answers2