-2

Available formats

Since I am a newbie to the Python world can anyone help me with the below scenario :

I have texts/descriptions from which I need to extract the word "PO" and the digits following it using Python.

I tried to extracting digits, but without success.

The formats are as below :

Additional Funnel Ireland (50% Deposit) - PO 12345
Monthly Retainer (PO00011223)
PO0000054321: 3 months: August, September, October
Monthly Retainer PYB (PO 11236)
Additional Funnel Czech Republic (50%) - PO is 78901

  • Does this answer your question? [How to extract the substring between two markers?](https://stackoverflow.com/questions/4666973/how-to-extract-the-substring-between-two-markers) – Ktoto Aug 25 '20 at 10:12
  • What is the logic supposed to be? Digits at the end? "PO" followed by anything followed by digits? The rules are unclear. – Thierry Lathuille Aug 25 '20 at 10:30
  • You can try regex : match = re.match("^(.*)(PO)(.*?)(\d+)$", line) print(match.group(2), match.group(4)) – Shivam Seth Aug 25 '20 at 10:34
  • @ThierryLathuille - i want the "PO" with all the digits wherever it is in the text. i hope i made my query clear now.. i have given for the available formats as well.. – Abhishek Mishra Aug 25 '20 at 10:39
  • @ShivamSeth will this work with all the formats i have included in my question? – Abhishek Mishra Aug 25 '20 at 11:10
  • yes, I tried with most of the pattern but only integers at end not float, This is basic code, You need to loop though each line or enable MultiLine flag of regex, you can include basic None check after pattern match, in case if not match, I could have posted entire code but your question is not accepting answer – Shivam Seth Aug 25 '20 at 11:13

4 Answers4

0

If the format is always the same, you can split the whole string by the spaces and grab the last en 2 but last position:

txt = "Additional funnel Czech Rep(50%) - PO is 12345"
splt = txt.split()

print(splt[-3], splt[-1])
S.D.
  • 2,486
  • 1
  • 16
  • 23
0

Considering PO 12345 is a string, you can select the 8 last characters of this string using [-8:].

Example :

a = 'code is 1234'
print(a[-4:])

Output gives '1234'.

0

If your data always looks like you posted, e.g.:

Additional Funnel Ireland (50% Deposit) - PO 12345
Monthly Retainer (PO00011223)
PO0000054321: 3 months: August, September, October
Monthly Retainer PYB (PO 11236)
Additional Funnel Czech Republic (50%) - PO is 78901

You can use regular expressions to extract your string,

import re
res = ''.join(re.search('(PO)[\sA-Za-z]*(\d+)', s).groups())

According to your previous post, the old solution was

s = "Additional Funnel Ireland(50% deposit) - PO 12345"
splitted = s.split(' - ')[-1].split()
res = splitted[0]+splitted[-1]

This first extracts the last part (by splitting using -) to get the part where you are interested in. Then you split again (by ) to finally get rid of possible intermediate text.

Stefan
  • 1,697
  • 15
  • 31
0

the following is the easiest way to extract the data

logic-> use string.find method to look for the index of PO in the string. lets assume x is the index of PO

extracted_string=PO[x:]

then replace the is with no space.

code->

txt = "Additional funnel Czech Rep(50%) - PO is 12345"
index=txt.find("PO")
extracted_string=txt[index:]
print(extracted_string.replace(" is ","")

Output

PO12345
Omkar Arora
  • 158
  • 1
  • 15