Need help extracting date from text in Python

Question

I have data that comes in every day via python code as such:

id="ContentPlaceHolder1_cph_main_cph_main_SummaryGrid">\r\n\t\t<tr class="tr-header">\r\n\t\t\t<th scope="col">&nbsp;</th><th class="right-align" scope="col">Share<br>Price</th><th class="right-align" scope="col">NAV</th><th class="right-align" scope="col">Premium/<br>Discount</th>\r\n\t\t</tr><tr>\r\n\t\t\t<td>Current</td><td class="right-align">$19.14</td><td class="right-align">$21.82</td><td class="right-align">-12.28%</td>\r\n\t\t</tr>

I need to extract the 2 prices and percentage values, in this example the "$19.14" "$21.82" and "-12.28%", but I am having trouble figuring out how to parse through and pull, is there a way to do this by looping through and searching for the text before/after?

The text before and after is always the same but the date changes. If not possible by this method, is there another way? Thank you very much!

What do you mean by "the date changes"? Depending on how easy it is to identify you could either use regex or string methods. — mapf, Sep 28 '21 at 18:36
I think I can use beautifulsoup but I still need to extract the actual piece of information and by date changes I mean literally the date will change every day — nbafan249, Sep 28 '21 at 20:16
Does this answer your question? [Extracting data from HTML table](https://stackoverflow.com/questions/11790535/extracting-data-from-html-table) — outis, Sep 29 '21 at 08:03

score 1 · Accepted Answer · answered Sep 28 '21 at 18:50

1

Here is the desired output:

from bs4 import BeautifulSoup

markup = """
<div class="row-fluid">
 <div class="span6">
  <p class="as-of-date">
   <span id="ContentPlaceHolder1_cph_main_cph_main_AsOfLabel">
    As of 9/24/2021
   </span>
  </p>
  <div class="table-wrapper">
   <div>
    &lt;table class="cefconnect-table-1 table table-striped" cellspacing="0" cellpadding="5" 
Border="0
   </div>
  </div>
 </div>
</div>

"""

soup = BeautifulSoup(markup, 'html.parser')
#print(soup.prettify())

tags= soup.select_one('#ContentPlaceHolder1_cph_main_cph_main_AsOfLabel').get_text()
print(tags.replace('As of ', ' '))

Output:

9/24/2021

answered Sep 28 '21 at 18:50

Md. Fazlul Hoque

15,806
5
12
32

Please create another post so that I can answer properly – Md. Fazlul Hoque Sep 28 '21 at 20:15
I'm trying but Stack won't let me because it says I've asked too many...is there a way to contact you privately? – nbafan249 Sep 28 '21 at 20:22
I edited the question to reflect what I'm asking for the 2nd piece, I reworked the code you posted and almost have it but not quite there yet. Please help, thank you! – nbafan249 Sep 28 '21 at 20:31

score 0 · Answer 2 · answered Sep 28 '21 at 18:45

If the date is the only content of the string changing you can split up the string to get the date:

result = mystring.split(
'</span>\r\n\t\t\t\t\t\t\t</p>\r\n\r\n\t\t\t\t\t\t\t<div class="table-wrapper">')


date = result[0][-10:]

Here you will get the date as a pure string, but you can also split it up to get a integer for each component of the date like this:

month, day, year = [int(num) for num in date.split('/')]

Need help extracting date from text in Python

2 Answers2