-1

Hi I'm pretty new to programming and Python, and this is my first post, so I apologize for any poor form.

I am scraping a website's download counts and am receiving the following error when attempting to convert the list of string numbers to integers to get the sum. ValueError: invalid literal for int() with base 10: '1,015'

I have tried .replace() but it does not seem to be doing anything.

And tried to build an if statement to take the commas out of any string that contains them: Does Python have a string contains substring method?

Here's my code:

    downloadCount = pageHTML.xpath('//li[@class="download"]/text()')
    downloadCount_clean = []

    for download in downloadCount:
        downloadCount_clean.append(str.strip(download))

    for item in downloadCount_clean:
        if "," in item:
            item.replace(",", "")
    print(downloadCount_clean)

    downloadCount_clean = map(int, downloadCount_clean)
    total = sum(downloadCount_clean)
Community
  • 1
  • 1
Chris
  • 3
  • 2
  • `.replace()` _returns a new string_ with the unwanted portions removed; it does not modify the existing string. You'll have to reassign `item` to the _result_ of the function: `item = item.replace(",", "")` – John Gordon Sep 22 '16 at 16:42
  • Will assigning `item` work here? I think the loop copies it (it's a value type), so you would be changing something, but it wouldn't get written back to the list. I think you need to reference `downloadCount_clean[index]` to make the change. – BallpointBen Sep 22 '16 at 16:47

2 Answers2

2

Strings are not mutable in Python. So when you call item.replace(",", ""), the method returns what you want, but it is not stored anywhere (thus not in item).

EDIT :

I suggest this :

for i in range(len(downloadCount_clean)):
    if "," in downloadCount_clean[i]:
        downloadCount_clean[i] = downloadCount_clean[i].replace(",", "")

SECOND EDIT :

For a bit more simplicity and/or elegance :

for index,value in enumerate(downloadCount_clean):
    downloadCount_clean[index] = int(value.replace(",", ""))
Daneel
  • 1,173
  • 4
  • 15
  • 36
  • 1
    You'd be better off adding it into the original `downloadCount_clean.append`, which would keep the code cleaner later. – Andrew Gelnar Sep 22 '16 at 16:47
0

For simplicities sake:

>>> aList = ["abc", "42", "1,423", "def"]
>>> bList = []
>>> for i in aList:
...     bList.append(i.replace(',',''))
... 
>>> bList
['abc', '42', '1423', 'def']

or working just with a single list:

>>> aList = ["abc", "42", "1,423", "def"]
>>> for i, x in enumerate(aList):
...     aList[i]=(x.replace(',',''))
... 
>>> aList
['abc', '42', '1423', 'def']

Not sure if this one breaks any python rules or not :)

Rolf of Saxony
  • 21,661
  • 5
  • 39
  • 60