Locate string index then reverse find regex and delete

Question

I have similar question as posted previously in Python Reverse Find in String.

Here is a sample of my very long string:

t1 = '''1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019 pending 1281640 city west link rd lilyfield 02/10/2019 16/10/2019 - 16/11/2019 pending 1276160 victoria rd rozelle 25/09/2019 14/10/2019 - 15/10/2019 pending 1331626 31/12/2019 - 31/01/2020 incomplete n/a 1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019'''

Update: 1/02/2020

I am grouping data into lists before putting into a dataframe. I dont want any data associated with 'incomplete n/a' Do I need to delete string or is there a regex function to recognised 'incomplete n/a' and group on its position?

I would like two outputs:

ONE this list t1L = ['1281674 ', '1281640 ', '1276160 ']. Notice this does not include 1331626.

TWO This string to be split or redefined (not containing 1331626) for example:

t1 = '''1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019 pending 1281640 city west link rd lilyfield 02/10/2019 16/10/2019 - 16/11/2019 pending 1276160 victoria rd rozelle 25/09/2019 14/10/2019 - 15/10/2019 pending'''

Thanks for any help.

I have updated the post to explain more clearly. How can I stop regex from grouping before reaching `'incomplete n/a'`? — izzleee, Jan 30 '20 at 12:24
In your second output, `1331626` **is** present, but in the text you say you want to remove it. What do you **really** want? — Toto, Jan 31 '20 at 17:14

n1tr0xs · Accepted Answer · 2020-01-29T04:35:28.967

1

I think there is working code for your problem new_str = t1[:t1.find(re.findall('\d{7}', t1[:t1.find('incomplete n/a')])[-1])])

edited Jan 29 '20 at 04:35

answered Jan 29 '20 at 04:29

n1tr0xs

408
2
9

This code generates new string but no list. Any other ideas to get output of list too? – izzleee Jan 31 '20 at 22:30

Toto · Answer 2 · 2020-02-01T10:50:31.807

You need 2 regexes to get 2 lists:

import re

t1 = '''1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019 pending 1281640 city west link rd lilyfield 02/10/2019 16/10/2019 - 16/11/2019 pending 1276160 victoria rd rozelle 25/09/2019 14/10/2019 - 15/10/2019 pending 1331626 31/12/2019 - 31/01/2020 incomplete n/a 1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019'''
clean = re.sub(r'\b\d{7}\b(?=(?:(?!\b\d{7}\b).)*incomplete n/a).*?$', '', t1)
print clean
res = re.findall(r'(\b\d{7}\b)', clean)
print res

Output:

1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019 pending 1281640 city west link rd lilyfield 02/10/2019 16/10/2019 - 16/11/2019 pending 1276160 victoria rd rozelle 25/09/2019 14/10/2019 - 15/10/2019 pending 
['1281674', '1281640', '1276160']

Demo & explanation

score 0 · Answer 3 · answered Jan 29 '20 at 05:02

You can try with below code using loop and conditions.

    import re
    t1 = '1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019 pending 1281640 city west link rd lilyfield 02/10/2019 16/10/2019 - 16/11/2019 pending 1276160 victoria rd rozelle 25/09/2019 14/10/2019 - 15/10/2019 pending 1331626 31/12/2019 - 31/01/2020 incomplete n/a 1314832 '

    result = None
    for t in t1.split(" "):

        if re.match("\d{7}",t):
            result = t
        if 'incomplete' in t:
            break

print(result)

Locate string index then reverse find regex and delete

3 Answers3