-2

Here I have a string.

word = '10-7-2022 product_id:10 order_id:2 cost:500'

Here, I need to extract only the product_id's and order_id's value which is after ':'. Kindly let me know how to make it happen.

wovano
  • 4,543
  • 5
  • 22
  • 49
  • 1
    are space between the date & product_id and product_id & order_id? – Prabhas Kumar Dec 07 '22 at 12:33
  • 1
    Use the `str.split` method. `word` is a sequence of space-separated strings; following the date, you have a sequence of `:`-separated key/value pairs. – chepner Dec 07 '22 at 12:36
  • 1
    You can achieve it by using `split`, refer here https://stackoverflow.com/questions/3475251/split-a-string-by-a-delimiter-in-python – chipsy Dec 07 '22 at 12:36
  • if the id will always be in the same position (In this example right after the first `product_id:`, use `split`. Otherwise, regex might be better. – MSH Dec 07 '22 at 12:39
  • What did you try already? Are you familiar with [`str.split()`](https://docs.python.org/3.3/library/stdtypes.html#str.split) and/or [regular expressions](https://docs.python.org/3.3/library/re.html)? – wovano Dec 07 '22 at 13:14

1 Answers1

1

One other method is to use regex with named groups:

import re
reg = re.compile(r"(?P<product_id>(?<=product_id:)\d+).*(?P<order_id>(?<=order_id:)\d+)")
word = '10-7-2022 product_id:10 order_id:2 cost:500'
result = reg.search(word)
result.group('product_id')
result.group('order_id')

If not every string contain product_id: or order_id: then you can create a more general regex, and test if there is a result.

EDIT

Also getting the date, assuming that it would always be in the order of d-m-y with or without leading zeros:

import re
reg = re.compile(r"(?P<date>\d+-\d+-\d+).*(?P<product_id>(?<=product_id:)\d+).*(?P<order_id>(?<=order_id:)\d+)")
word = '10-7-2022 product_id:10 order_id:2 cost:500'
result = reg.search(word)
result.group('date')

You can also get a datetime object:

from datetime import datetime
datetime.strptime(result.group('date'), r'%d-%m-%Y')
>>> datetime.datetime(2022, 7, 10, 0, 0)
3dSpatialUser
  • 2,034
  • 1
  • 9
  • 18
  • 1
    @PrabhasKumar Yes that is true, but As we said before, using regex makes it possible to use more general expressions. For example when the string later also contains an `client_id:` for example. But again, if you know the exact input there are faster methods. But it depends on the amount of data you want to process to determine if the performance is significant. In my opinion both methods have their use cases. – 3dSpatialUser Dec 07 '22 at 14:13
  • 1
    Also if I may be so blunt, as long as one is using `python`, performance is not the most important thing in mind, I think it is save to say that when using python the simplicity is more important. – 3dSpatialUser Dec 07 '22 at 14:15