0

I have similar question as posted previously in Python Reverse Find in String.

Here is a sample of my very long string:

t1 = '''1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019 pending 1281640 city west link rd lilyfield 02/10/2019 16/10/2019 - 16/11/2019 pending 1276160 victoria rd rozelle 25/09/2019 14/10/2019 - 15/10/2019 pending 1331626 31/12/2019 - 31/01/2020 incomplete n/a 1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019'''

Update: 1/02/2020

I am grouping data into lists before putting into a dataframe. I dont want any data associated with 'incomplete n/a' Do I need to delete string or is there a regex function to recognised 'incomplete n/a' and group on its position?

I would like two outputs:

ONE this list t1L = ['1281674 ', '1281640 ', '1276160 ']. Notice this does not include 1331626.

TWO This string to be split or redefined (not containing 1331626) for example:

t1 = '''1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019 pending 1281640 city west link rd lilyfield 02/10/2019 16/10/2019 - 16/11/2019 pending 1276160 victoria rd rozelle 25/09/2019 14/10/2019 - 15/10/2019 pending'''

Thanks for any help.

izzleee
  • 315
  • 3
  • 11

3 Answers3

1

I think there is working code for your problem new_str = t1[:t1.find(re.findall('\d{7}', t1[:t1.find('incomplete n/a')])[-1])])

n1tr0xs
  • 408
  • 2
  • 9
1

You need 2 regexes to get 2 lists:

import re

t1 = '''1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019 pending 1281640 city west link rd lilyfield 02/10/2019 16/10/2019 - 16/11/2019 pending 1276160 victoria rd rozelle 25/09/2019 14/10/2019 - 15/10/2019 pending 1331626 31/12/2019 - 31/01/2020 incomplete n/a 1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019'''
clean = re.sub(r'\b\d{7}\b(?=(?:(?!\b\d{7}\b).)*incomplete n/a).*?$', '', t1)
print clean
res = re.findall(r'(\b\d{7}\b)', clean)
print res

Output:

1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019 pending 1281640 city west link rd lilyfield 02/10/2019 16/10/2019 - 16/11/2019 pending 1276160 victoria rd rozelle 25/09/2019 14/10/2019 - 15/10/2019 pending 
['1281674', '1281640', '1276160']

Demo & explanation

Toto
  • 89,455
  • 62
  • 89
  • 125
0

You can try with below code using loop and conditions.

    import re
    t1 = '1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019 pending 1281640 city west link rd lilyfield 02/10/2019 16/10/2019 - 16/11/2019 pending 1276160 victoria rd rozelle 25/09/2019 14/10/2019 - 15/10/2019 pending 1331626 31/12/2019 - 31/01/2020 incomplete n/a 1314832 '

    result = None
    for t in t1.split(" "):

        if re.match("\d{7}",t):
            result = t
        if 'incomplete' in t:
            break

print(result)
Akash senta
  • 483
  • 7
  • 16