0

I scraped data from a site with beautifulsoup. The same tag as time and update date. That's why I want to separate them and delete "Update:" string. I couldn't imagine how.

The strings I want should be like this: a="4 July 2019 Friday 07:52" b="04.07.2019 07:52"

publishTime=source.find("div", attrs={"class":"textInfo"}).text
print(publishTime.strip())
4 July 2019 Friday 07:52
                                Update: 04.07.2019 07:52
feyZ1g
  • 62
  • 8
  • Possible duplicate of [Split Strings into words with multiple word boundary delimiters](https://stackoverflow.com/questions/1059559/split-strings-into-words-with-multiple-word-boundary-delimiters) – norok2 Jul 05 '19 at 09:07
  • Can you share the url? – QHarr Jul 05 '19 at 13:17

2 Answers2

0

You can remove the Update expression using regular expressions.

Here is a suggestion on how to to that:

import re

str = '''
4 July 2019 Friday 07:52
                                Update: 04.07.2019 07:52
'''

str_changed = re.sub(r'Update:.+', '', str).strip()

print(f'"{str_changed}"')

If you run this code, this will print out:

"4 July 2019 Friday 07:52"
gil.fernandes
  • 12,978
  • 5
  • 63
  • 76
0

If I understand you correctly, this may be what your'e looking for - with no regex:

publishTime = '''
4 July 2019 Friday 07:52
                                Update: 04.07.2019 07:52
'''

vars = ['a','b']
vals = publishTime.split(' Update: ')
for var,val in zip(vars,vals):
    sval = val.strip()
    print(f'{var} = "{sval}"')

Output:

a = "4 July 2019 Friday 07:52"
b = "04.07.2019 07:52"
Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45