0

I have a CSV containing some columns with data, the 15th column reports a list of URLs. Now, I need to select each URL from the column, scrape a new price from the target webpage, and then store that in the price column to update the old price.

Without the same column enumeration, here is an approximate CSV:

asin,title,product URL,price
KSKFUSH01,Product Title,http://....,56.00

Below is the sample code I wrote, but it merely prints URLs :(

import csv

with open('some.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)

for line in csv_reader:
    print(line[15])

Any help or suggestions about accomplishing this goal?

Thanks

Stew
  • 95
  • 1
  • 2
  • 9
  • do you have a sample URL? I would recommend looking into BeautifulSoup – usernamenotfound Dec 05 '17 at 17:14
  • Hello tutorial looks great, this is sample URL: https://www.amazon.it/dp/B006H9B6XI, but basically i don't know how read each URL from csv file – Stew Dec 05 '17 at 17:20
  • If the csv has properly named columns, you can split each line by commas to extract the URLs. –  Dec 05 '17 at 17:21
  • i get an error: AttributeError: 'list' object has no attribute 'split' – Stew Dec 05 '17 at 17:23
  • Your writing is difficult to parse. Your sample csv does not match your description. I am updating my answer. –  Dec 05 '17 at 17:38
  • yes i know, piece of code i posting is useless – Stew Dec 05 '17 at 17:46

2 Answers2

1

Here's a good guide on how to scrape websites using BeautifulSoup https://medium.freecodecamp.org/how-to-scrape-websites-with-python-and-beautifulsoup-5946935d93fe

usernamenotfound
  • 1,540
  • 2
  • 11
  • 18
1

It looks like you want to use a csv writer. You can access the URL in each line. Here is how you can write the new price.

import csv
import urllib2
from bs4 import BeautifulSoup
with open('some.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)

with open('newPricedata.csv', 'w', newline='') as Newcsvfile:
Pricewriter = csv.writer(Newcsvfile, delimiter=' ',
                        quotechar='|', quoting=csv.QUOTE_MINIMAL)
for line in csv_reader:
page = urllib2.urlopen(line[15])
soup = BeautifulSoup(page, ‘html.parser’)
price = soup.find(‘td’, attrs={‘class’: ‘a-size-mini a-color-price ebooks-price-savings a-text-normal'})
Pricewriter.writerow(line[0]+','+,line[1]+','....+price.text)