Scrape URL provided by CSV

Question

I have a CSV containing some columns with data, the 15th column reports a list of URLs. Now, I need to select each URL from the column, scrape a new price from the target webpage, and then store that in the price column to update the old price.

Without the same column enumeration, here is an approximate CSV:

asin,title,product URL,price
KSKFUSH01,Product Title,http://....,56.00

Below is the sample code I wrote, but it merely prints URLs :(

import csv

with open('some.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)

for line in csv_reader:
    print(line[15])

Any help or suggestions about accomplishing this goal?

Thanks

do you have a sample URL? I would recommend looking into BeautifulSoup — usernamenotfound, Dec 05 '17 at 17:14
Hello tutorial looks great, this is sample URL: https://www.amazon.it/dp/B006H9B6XI, but basically i don't know how read each URL from csv file — Stew, Dec 05 '17 at 17:20
If the csv has properly named columns, you can split each line by commas to extract the URLs. — , Dec 05 '17 at 17:21
i get an error: AttributeError: 'list' object has no attribute 'split' — Stew, Dec 05 '17 at 17:23
Your writing is difficult to parse. Your sample csv does not match your description. I am updating my answer. — , Dec 05 '17 at 17:38

score 1 · Answer 1 · answered Dec 05 '17 at 17:15

1

Here's a good guide on how to scrape websites using BeautifulSoup https://medium.freecodecamp.org/how-to-scrape-websites-with-python-and-beautifulsoup-5946935d93fe

answered Dec 05 '17 at 17:15

usernamenotfound

1,540
2
11
18

score 1 · Accepted Answer · 2017-12-05T17:56:42.587

1

It looks like you want to use a csv writer. You can access the URL in each line. Here is how you can write the new price.

import csv
import urllib2
from bs4 import BeautifulSoup
with open('some.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)

with open('newPricedata.csv', 'w', newline='') as Newcsvfile:
Pricewriter = csv.writer(Newcsvfile, delimiter=' ',
                        quotechar='|', quoting=csv.QUOTE_MINIMAL)
for line in csv_reader:
page = urllib2.urlopen(line[15])
soup = BeautifulSoup(page, ‘html.parser’)
price = soup.find(‘td’, attrs={‘class’: ‘a-size-mini a-color-price ebooks-price-savings a-text-normal'})
Pricewriter.writerow(line[0]+','+,line[1]+','....+price.text)

edited Dec 05 '17 at 17:56

answered Dec 05 '17 at 17:19

seems i can't split strings – Stew Dec 05 '17 at 17:52
I updated my answer with some basic code that should help you. – Dec 05 '17 at 17:57
It is bad form to do so. Read this post: https://stackoverflow.com/questions/3146571/read-and-write-on-same-csv-file – Dec 05 '17 at 18:03
Ok thank you! I have to test you code now, looks very professional! – Stew Dec 05 '17 at 18:06

Scrape URL provided by CSV

2 Answers2