The string that I ended up after scraping 1000 Reuters articles looks like this:
<TEXT>
<TITLE>IF DOLLAR FOLLOWS WALL STREET JAPANESE WILL DIVEST</TITLE>
<AUTHOR> By Yoshiko Mori</AUTHOR>
<DATELINE> TOKYO, Oct 20 - </DATELINE><BODY>If the dollar goes the way of Wall Street,
Japanese will finally move out of dollar investments in a
serious way, Japan investment managers say.
REUTER
</BODY></TEXT>
I want to extract the title, author, dateline and body out of this string. To do that, I have the below regex but unfortunately, it is not working for the body section.
try:
body=re.search('<BODY>(.)</BODY>',example_txt).group(1)
except:
body='NA'
This try-except always returns NA
for body but works for title, author and dateline. Any idea why?
Thanks!