0

Good afternoon, how to use the PythonRegEx module re in Excel data to get data from the example: #0000002947 _ _ 0 _ PK2/6700094735 only the last PK2/6700094735, this means that it writes out/searches for the first PK and then writes out all the data that is on the right side.

My code

    while ('__' in opis):
        opis = opis.replace('__', '_',)

    try:
        order = opis.split('#')[1].split('_')[1]
    except:
        pass
    if not order:
        try:
            order = opis.split('_')[-1]
        except:
            if df[i]['KAT FI'] == 'Kat ZKP':
                order = re.findall("PK+\w", opis)
            else:
                order = ""
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • Please include a _small_ subset of your data as a __copyable__ piece of code that can be used for testing as well as your expected output for the __provided__ data. See [mre], and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888) for more information. – Henry Ecker Mar 07 '22 at 20:35

1 Answers1

0

I assume that in general, your text will have the same format as the example you give:

#0000002947 _ _ 0 _ PK2/6700094735

The code sample that will extract the last part: PKX/XXXXXXXXXX, with X being some digit is following:

import re
your_text = "#0000002947 _ _ 0 _ PK2/6700094735"
regex_pattern = r"#\d{10}\s_\s_\s\d\s_\s(PK\d\/\d{10})"
last_group = re.search(regex_pattern, your_text).group(1)
assert last_group == 'PK2/6700094735'

Hope it helps. It might be the case that you need to update regex in order to fit your general problem. If you show me more examples I could update regex pattern accordingly.

Artur Pschybysz
  • 194
  • 2
  • 10