0

I just want to extract price data with out the $. There are multiple prices in the file and I only want the ones that come after class="price price-label "> and not the ones that are after class="strike">

I Pasted the full code - I am pulling the info form file.txt - My desired output is to have the name and price Side by side. I have not used Beautiful Soup before.

data-default-alt="Ford Truck">       </h3>     </a>           </div>     <div class="tileInfo">                <div class="swatchesBox--empty"></div>                                                     <div class="promo-msg-text">           <span class="calloutMsg-promo-msg-text"></span>         </div>                              <div class="pricecontainer" data-pricetype="Stand Alone">               <p id="price_206019013" class="price price-label ">                  $1,000.00               </p>

My Code

with open("targetbubbles.txt") as str:
    st = str.read()
    #print st

import re

#brand=re.search('data-default-title=\"(.*?)" ',st)

#cost=re.search('\$(\d+,?\d*\.\d+)</p>',st)
turtle02
  • 603
  • 3
  • 10
  • 17

1 Answers1

1

beautifulsoup is a helpful module for this kind of crap

>>> import bs4
>>> s = '''      <p id="price_206019013" class="price price-label ">                  $2.84               </p>                                              <p class="regularprice-label">      Reg.      <span class="screen-reader-only"> price</span>      <span class="strike">       $2.99      </span>     </p>                    <div class="eyeBrow sale-msg">      <span '''
>>> soup = bs4.BeautifulSoup(s, 'lxml')
>>> soup.find_all('p', class_='price price-label ')
[<p class="price price-label " id="price_206019013">                  $2.84               </p>]
>>> [result] = soup.find_all('p', class_='price price-label ')
>>> result.text.strip(' $')
u'2.84'
wim
  • 338,267
  • 99
  • 616
  • 750
  • Thanks! I made some edits, to give more info as I am not familiar with Beautiful Soup. I will Defiantly give it a shot after this though. – turtle02 Apr 01 '16 at 19:16