-2

I have data that comes in every day via python code as such:

id="ContentPlaceHolder1_cph_main_cph_main_SummaryGrid">\r\n\t\t<tr class="tr-header">\r\n\t\t\t<th scope="col">&nbsp;</th><th class="right-align" scope="col">Share<br>Price</th><th class="right-align" scope="col">NAV</th><th class="right-align" scope="col">Premium/<br>Discount</th>\r\n\t\t</tr><tr>\r\n\t\t\t<td>Current</td><td class="right-align">$19.14</td><td class="right-align">$21.82</td><td class="right-align">-12.28%</td>\r\n\t\t</tr>

I need to extract the 2 prices and percentage values, in this example the "$19.14" "$21.82" and "-12.28%", but I am having trouble figuring out how to parse through and pull, is there a way to do this by looping through and searching for the text before/after?

The text before and after is always the same but the date changes. If not possible by this method, is there another way? Thank you very much!

nbafan249
  • 23
  • 6
  • Can you use HTML parser such as `beautifulsoup`? – Andrej Kesely Sep 28 '21 at 18:31
  • What do you mean by "the date changes"? Depending on how easy it is to identify you could either use regex or string methods. – mapf Sep 28 '21 at 18:36
  • I think I can use beautifulsoup but I still need to extract the actual piece of information and by date changes I mean literally the date will change every day – nbafan249 Sep 28 '21 at 20:16
  • Does this answer your question? [Extracting data from HTML table](https://stackoverflow.com/questions/11790535/extracting-data-from-html-table) – outis Sep 29 '21 at 08:03

2 Answers2

1

Here is the desired output:

from bs4 import BeautifulSoup

markup = """
<div class="row-fluid">
 <div class="span6">
  <p class="as-of-date">
   <span id="ContentPlaceHolder1_cph_main_cph_main_AsOfLabel">
    As of 9/24/2021
   </span>
  </p>
  <div class="table-wrapper">
   <div>
    &lt;table class="cefconnect-table-1 table table-striped" cellspacing="0" cellpadding="5" 
Border="0
   </div>
  </div>
 </div>
</div>

"""

soup = BeautifulSoup(markup, 'html.parser')
#print(soup.prettify())

tags= soup.select_one('#ContentPlaceHolder1_cph_main_cph_main_AsOfLabel').get_text()
print(tags.replace('As of ', ' '))

Output:

9/24/2021
Md. Fazlul Hoque
  • 15,806
  • 5
  • 12
  • 32
0

If the date is the only content of the string changing you can split up the string to get the date:

result = mystring.split(
'</span>\r\n\t\t\t\t\t\t\t</p>\r\n\r\n\t\t\t\t\t\t\t<div class="table-wrapper">')


date = result[0][-10:]

Here you will get the date as a pure string, but you can also split it up to get a integer for each component of the date like this:

month, day, year = [int(num) for num in date.split('/')]
ph140
  • 478
  • 3
  • 10