How to remove a portion of a string with variable length

Question

I have a DataFrame in which one column is rows of strings that look like:

Received value 126;AOC;H3498XX from 602
Received value 101;KYL;0IMMM0432 from 229

I want to drop (or replace with nothing) the part after the second semicolon so that it looks like

Received value 126;AOC; from 602

But this part I want to drop will have varying and unpredictable lengths (always combinations of A-Z and 0-9). The semicolons and froms will always be there for reference.

I'm trying to use regex by studying this link: https://docs.python.org/3/library/re.html

import re
for row in df[‘column’]:
    row = re.sub(‘;[A-Z0-9] from’ , ‘; from’, row)

I think the [A-Z0-9] fails to incorporate the varying length aspect I want.

`print(s.replace(s.split(";",2)[-1].split()[0],''))`? – DirtyBit Apr 10 '19 at 12:52 — DirtyBit, Apr 10 '19 at 12:52

DirtyBit · Accepted Answer · 2019-04-10T13:11:02.207

2

An example using str.replace() with str.split():

s = ['126;AOC;H3498XX from 602', '101;KYL;0IMMM0432 from 229']

for elem in s:
    print(elem.replace(elem.split(";",2)[-1].split()[0],''))

OUTPUT:

126;AOC; from 602
101;KYL; from 229

EDIT:

The same would work with the following example as well:

s = ['Received value 126;AOC;H3498XX from 602', 'Received value 101;KYL;0IMMM0432 from 229']

for elem in s:
    print(elem.replace(elem.split(";",2)[-1].split()[0],''))

OUTPUT:

Received value 126;AOC; from 602
Received value 101;KYL; from 229

edited Apr 10 '19 at 13:11

answered Apr 10 '19 at 12:54

DirtyBit

16,613
4
34
55

This works perfectly when I use print(), but I want the output to remain in the column of my data frame. When I try for elem in s: s['column'] = elem.replace.... it doesn't give me the expected output. Do you know how to keep the output within the column of that dataframe? – Eric N. Apr 11 '19 at 12:16
@EricN. you could iterate through the desired rows and replace the values: https://stackoverflow.com/questions/25478528/updating-value-in-iterrow-for-pandas – DirtyBit Apr 11 '19 at 12:17

score 1 · Answer 2 · answered Apr 10 '19 at 13:02

Use pattern (Received value \d+;[A-Z]+;)\w+(\s.*?)

Ex:

import re

s = ["Received value 126;AOC;H3498XX from 602", "Received value 101;KYL;0IMMM0432 from 229"]

for i in s:
    print( re.sub(r"(Received value \d+;[A-Z]+;)\w+(\s.*?)", r"\1", i) )

Output:

Received value 126;AOC;from 602
Received value 101;KYL;from 229

How to remove a portion of a string with variable length

2 Answers2