1

I am new to web scraping and for practice I am trying to web scrape a website and turn the results into a csv file. When I come to the part to turn the results into a csv file, it doesn't put the address in the address column. I want the data to go into the address column. The code is as follows.

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.allagents.co.uk/find-agent/london/'

uClient = uReq(my_url)

page_html = uClient.read()

uClient.close()

page_soup = soup(page_html, 'html.parser')

containers = page_soup.findAll('div', {'class':'itemlabel3'})

filename = "webscrape.csv" 

f = open(filename, "w")

headers = "Company Name, Address, Telephone Number\n"

f.write(headers)

for container in containers:
    comp_name   = container.find('div', {'class':'labelleft2 col-md- 
10'}).div.h4.a.text

    address     = container.find('div', {'class':'labelleft2 col-md- 
   10'}).div.p.text

    tel         = container.find('div', {'class':'labelleft2 col-md- 
   10'}).div.find('p', {'style':'clear: both; margin-bottom: 
15px;'}).strong.text

    print("Company Name:", comp_name)
    print("Address:", address)
    print("Telephone", tel)

    f.write(comp_name.replace(",", ("|")) + "," + address.replace(",", ("|")) + 
"," + tel + "\n")

f.close()

Any help is appreciated. Thanks you in advance.

H D
  • 81
  • 1
  • 18
  • I'd try to avoid creating the CSV manually. Dump your data into a pandas dataframe (that can export to csv) or use the [csv module](https://docs.python.org/3/library/csv.html) or something. – RagingRoosevelt May 08 '18 at 17:17
  • O.K. but if I want to create a csv manually just for practice, I want to know where I have gone wrong. Also what's wrong with creating the csv manually? – H D May 08 '18 at 17:21
  • re: what's wrong with creating it manually? Nothing is *wrong* about it. More of a no-need-to-reinvent-the-wheel type deal. Practice is a great reason to try it manually. – RagingRoosevelt May 08 '18 at 17:27
  • Cool. Can you link any articles about making it into a pandas dataframe? Or just write an answer and show me how to do it. It seems interesting. Thanks. – H D May 08 '18 at 17:30
  • By the way, if you do see where I have gone wrong originally please let me know. You can always run this code yourself if you want to try. – H D May 08 '18 at 17:31
  • It might be helpful if you could attach a screenshot of the csv where you're seeing the address show up in multiple columns. Here's what I'd do: `records = []` then in your for loop `records.append({'company': comp_name, 'address': address, 'telephone': tel})` then once you've read all the records, you can do `writer = csv.DictWriter(filename, fieldnames=['company', 'address', 'telephone'])`, `writer.writeheader()`, `for r in records: writer.writerow(r)`. My computer can't connect to the website you linked right now so I'm having trouble running your code. I'll see if I spot the issue, tho. – RagingRoosevelt May 08 '18 at 17:40
  • In your print statements, I'd do something like `print("Address: '{}'".format(address))` in order to better see what you're reading in for those values. – RagingRoosevelt May 08 '18 at 17:46
  • I can't actually put a screenshot for some reason but there are three columns: Comany Name, Address and telephone number. In the company name column the company name goes there, but rather than the address shifting column it makes a new row and the telephone goes into the address. – H D May 08 '18 at 17:55
  • Also what is the records meant to mean? – H D May 08 '18 at 18:02
  • `records` is just a list of what will become each row. The csv writer wants a dictionary to be passed when you write the row to the file so `records` stores each row's dictionary as you encounter the data. This approach also sets you up in case you wanted to use dataframes at some point since you could [build a dataframe from that list of dictionaries](https://stackoverflow.com/questions/20638006/convert-list-of-dictionaries-to-dataframe). – RagingRoosevelt May 08 '18 at 22:32

1 Answers1

2

it seems like in your address data new line character is present

try to replace below line for address in your code and try running again

address=(container.find('div', {'class':'labelleft2 col-md-10'}).div.p.text).replace('\n','')
PythonUser
  • 706
  • 4
  • 15
  • Thanks your solution works but it kind of distorts the csv file. – H D May 08 '18 at 18:19
  • 1
    This is because in address fields there are multiple spaces are there ...if you trim the spaces ....your csv will get correctly generated :) – PythonUser May 08 '18 at 18:21