1

Here is something I have been wondering about for quite some time. Please consider the following code in Python:

soup = BeautifulSoup(website.content, 'html.parser')
data = soup.find_all("div", class_ = "innerInfo")

auction_price = element.select(".EUR")[0].text
auction_price = auction_price.split("€")[1]
auction_price = auction_price.replace(",", "")
auction_price = float(auction_price)

I am trying to add the prices of specific items on ebay to a database to create a time series. To get to the auction_price, I have to extract and reformat the price multiple times. Selecting the correct element, removing the € Sign, removing commas and finally convert it into a float.

In the end I have to declare the same variable "auction_price" four times in a row. As far as I can tell, this is not considered "clean code". I have considered the following alternatives:

  1. Do all the reformatting in one line. However this would be much worse in regards to readability and also not meet clean code standards.
  2. Find a "smarter" way to extract the data which cuts the need for reformatting. However its not guaranteed that there always is such a way. So even if there would be a much simpler operation for this specific example I would still be interested in a best practice solution,
  3. Use a different variable name for every step. However creating a lot of temporary variables which are not really used anywhere seems a bit inefficient and would probably also not really meet clean code standards?

I assume that reformatting data in multiple operations happens very often and there should be best practices for this in most languages. However, I was unable to find anything in Clean Code or PEP8 that would really answer this specific problem. I also tried google, but it was kind of hard to phrase the problem in one search query, so most hits where not nearly related to what I was looking for.

Does anyone know what is considered the best practice in this matter?

MrTony
  • 264
  • 1
  • 12

1 Answers1

1
auction_price = element.select(".EUR")[0].text
auction_price = auction_price.split("€")[1]
auction_price = auction_price.replace(",", "")
auction_price = float(auction_price)

I do not know if this is PEP8-conform, but it could be simply rewritten like this:

auction_price = float(element.select(".EUR")[0].text.split("€")[1].replace(",", ""))

or, with method chaining (also called builder pattern):

auction_price = (
    element.select(".EUR")[0].text
    .split("€")[1]
    .replace(",", "")
    __float__(auction_price)
)

Note : str.replace only replaces the first occurrence of pattern in str. To replace all occurrences use re.sub(pattern, replacement, str)

TheEagle
  • 5,808
  • 3
  • 11
  • 39
  • Do you know any resources where I can get further information on this notation? I have searched for "python object notation", but I never found anything which relates to my question. Most answers are about dot notation and how to use this notation for object methods and attributes. – MrTony Mar 12 '21 at 11:33
  • 1
    @MrTony the reason that you cannot find anything about it is, that I provided the wrong name ! I fixed that now, and for more information, you can see [this question](https://stackoverflow.com/questions/66602043/how-is-this-python-design-pattern-called)'s answers. – TheEagle Mar 12 '21 at 15:00