0

I am trying to convert the last 'price' item in my list to an actual float and not a string in my output. Is this possible?

OUTPUT

{'name': 'ADA Hi-Lo Power Plinth Table', 'product_ID': '55984', 'price': '$2,849.00'}
{'name': 'Adjustable Headrest Couch - Chrome-Plated Steel Legs', 'product_ID': '31350', 'price': '$729.00'}
{'name': 'Adjustable Headrest Couch - Chrome-Plated Steel Legs (X-Large)', 'product_ID': '31351', 'price': '$769.00'}
{'name': 'Adjustable Headrest Couch - Hardwood Base (No Drawers)', 'product_ID': '65446', 'price': '$1,059.00'}      
{'name': 'Adjustable Headrest Couch - Hardwood Base 2 Drawers', 'product_ID': '65448', 'price': '$1,195.00'}
{'name': 'Adjustable Headrest Couch - Hardwood Tapered Legs', 'product_ID': '31355', 'price': '$735.00'}
{'name': 'Adjustable Headrest Couch - Hardwood Tapered Legs (X-Large)', 'product_ID': '31356', 'price': '$775.00'}
{'name': 'Angeles Rest Standard Cot Sheets - ABC Print', 'product_ID': 'A31125', 'price': '$11.19'}

START OF PYTHON SCRIPT

import requests
from bs4 import BeautifulSoup
import sys

with open('recoveryCouches','r') as html_file:
    content= html_file.read()
    soup = BeautifulSoup(content,'lxml')
    allProductDivs = soup.find('div', class_='product-items product-items-4')
    nameDiv = soup.find_all('div',class_='name')
    prodID = soup.find_all('span', id='product_id')
    prodCost = soup.find_all('span', class_='regular-price')

    records=[]
     
    for i in range(len(nameDiv)):
        records.append({
            "name": nameDiv[i].find('a').text.strip(),
            "product_ID": prodID[i].text.strip(),
            "price": prodCost[i].text.strip()
            })

    for x in records:
        print(x)
Reinderien
  • 11,755
  • 5
  • 49
  • 77
ch11nV11n
  • 43
  • 1
  • 7
  • 1
    `float(price[1:].replace(',', ''))` – Forest 1 Aug 04 '21 at 01:11
  • use regex `[\d\.]+` to capture float number only – nay Aug 04 '21 at 01:34
  • Probable duplicate of https://stackoverflow.com/questions/37580151/parse-currency-into-numbers-in-python – Reinderien Aug 04 '21 at 01:54
  • @Forest1 can you tell me where I need to add that section of code? – ch11nV11n Aug 04 '21 at 02:12
  • @deyizzle haven't I told you! Have you really checked my answer carefully and pulled out that accepted answer. – imxitiz Aug 04 '21 at 02:13
  • `"price": prodCost[i].text.strip()` instead : `"price": float(prodCost[i].text.strip()[1:].replace(',', ''))` – Forest 1 Aug 04 '21 at 02:17
  • @Forest1 Yeah! I had already told OP to do that and follow your answer in [my answer](https://stackoverflow.com/a/68644417/12446721) but I don't know what OP is actually thinking. :( – imxitiz Aug 04 '21 at 02:19
  • As you know, you can't convert $ to number. So you can ignore it when save it. Also about `,`, we can remove from them. – Forest 1 Aug 04 '21 at 02:37
  • when you use `float(prodCost[i].text.strip()[1:].replace(',', ''))`, you can get only value from price string. – Forest 1 Aug 04 '21 at 02:39

2 Answers2

1

You can try this, since you can't convert both $ and , to float. You can replace both of them, and convert.

You may use re module to replace them at once :

import re

for i in range(len(nameDiv)):
    records.append({
        "name": nameDiv[i].find('a').text.strip(),
        "product_ID": prodID[i].text.strip(),
        "price": float(re.sub(r"[$,]","",prodCost[i].text.strip()))
            })

Or if all of the string have $ at first the you can follow @Forest comment,

float(price[1:].replace(',', ''))

Like this:

float(prodCost[i].text.strip()[1:].replace(",",""))
imxitiz
  • 3,920
  • 3
  • 9
  • 33
  • 1
    I believe you are trying to say `float` rather then integer cuz you can't convert `11.19` into integer. :) – imxitiz Aug 04 '21 at 01:35
  • Using string manipulation to pre-strip a currency symbol is not a great approach when `locale` exists. – Reinderien Aug 04 '21 at 01:49
  • @Xitiz I tried both `float(prodCost[i].text.strip()[1:].replace(",",""))` and `"price": float(re.sub(r"[$,]","",prodCost[i].text.strip()))` – ch11nV11n Aug 04 '21 at 02:25
  • Are you getting any error? If you had tried exactly that then, that should work for you, can you provide complete code so that I can check if you're doing right or wrong? – imxitiz Aug 04 '21 at 02:25
  • Not getting any errors, my output is just still showing the $ and , in that price field. – ch11nV11n Aug 04 '21 at 02:39
  • I am asking you to provide complete code! If that is not working then you can edit your question and add this part. It should work for all normal condition so, I have to check it my myself you aren't doing wrong anyway! – imxitiz Aug 04 '21 at 02:51
0

Naive removal of the currency symbol prefix makes your code non-i18n-compatible and fragile. The general solution is a little complicated, but if you assume that the currency symbol remains a prefix and that's a Canadian dollar symbol, then:

from locale import setlocale, LC_ALL, localeconv, atof
from decimal import Decimal
import re

setlocale(LC_ALL, ('en_CA', 'UTF-8'))

# ...

price_str = re.sub(r'\s', '', prodCost[i].text)
stripped = price_str.removeprefix(localeconv()['currency_symbol'])
price = atof(stripped, Decimal)

Also note that Decimal is a better representation of a currency than a float for most purposes.

Reinderien
  • 11,755
  • 5
  • 49
  • 77
  • If we can do it in every easy way then, why should be do this? Have you downvoted my answer to answer this complicated things? – imxitiz Aug 04 '21 at 02:08
  • @Xitiz As in the answer: fragility. The easy way is correct until it isn't. – Reinderien Aug 04 '21 at 02:11
  • Are you sure, my answer is that bad to downvote? Just doing replace should work, but you are trying to over complicate the easy things. – imxitiz Aug 04 '21 at 02:14
  • @Xitiz Check out [this interesting map](https://en.wikipedia.org/wiki/Decimal_separator#/media/File:DecimalSeparator.svg) and tell me what it means to you. To me, it means a million-dollar mistake if you do international business and accidentally mis-assign a hard-coded decimal separator, as you're at risk of doing in your answer. – Reinderien Aug 04 '21 at 02:20
  • Sorry @Xitiz! Oddly enough, after modifying that line nothing changed in my output. I'm still seeing the $ and , – ch11nV11n Aug 04 '21 at 02:20
  • @deyizzle so you have to told that first. Without you told me, how can I know that my answer didn't work for you? – imxitiz Aug 04 '21 at 02:23
  • 1
    locale would be nice for the decimal point ".", which is a comma instead, in certain locales. But currency symbols are seldom standardized in "real world" data - there are even competing standards - US Dollars could be prefixed with either "$", "USD", "US$" and so on. So what works is whatever will fit the specific data in the input set. Filtering for whitespace before and after the "$" sigin would be much less "fragile" than naively using the currency symbol given by the locale settings and expect that to work. – jsbueno Aug 04 '21 at 02:25
  • @jsbueno I'll buy that dropping whitespace is a good idea (edited). Certainly what works is whatever will fit the specific data in the input set; but saying that there are competing standards and therefore nothing should be done is a false dichotomy, particularly when there is built-in support for localisation. – Reinderien Aug 04 '21 at 02:35