2

Background:

I am in the process of building a program that scrapes weather data from the internet, and displays it to the user as part of a GUI. The user will enter in their location details, specifically their PostCode or ZipNumber, City or Town, Latitude, and Longitude. The program will store these four pieces of information into a textfile, this is so that the details can be read each time the user wants to request weather data, instead of having to enter these details in on each request. The modules that this problem concerns are urllib, and BeautifulSoup.

import urllib.request
from bs4 import BeautifulSoup

The Problem:

I have successfully managed to store the user details into a text file, and also read from it. The code for inserting the data looks like this:

userPostcode = postcodeEntry.get()
userCity     = cityEntry.get()
userLat      = latitudeEntry.get()
userLong     = longitudeEntry.get()
file = open("LocationInfo.txt", 'w')
file.write(str(userPostcode) + "\n")
file.write(str(userCity) + "\n")
file.write(str(userLat) + "\n")
file.write(str(userLong)+ "\n")
file.close()

The structure of the data inside the textfile looks like this:

SK15 IJF
SOME TOWN
54.25
-122.312

The code for reading from the textfile looks like this:

f=open('LocationInfo.txt')
line=f.readlines()
Post = line[0]
Town = line[1]
Lat  = line[2]
Long = line[3]
f.close()

The way I have inserted the values of these variables into the URL is by using this method:

page_url = "https://www.metcheck.com/WEATHER/now_and_next.asp? 
zipcode=%s+%s&lat=%s&lon=%s" % (Post, Town, Lat, Long)
soup = BeautifulSoup(urllib.request.urlopen(page_url), "lxml")

*note the url is all on one line in the actual program.

The Error:

The error I am receiving is:

Exception in Tkinter callback
Traceback (most recent call last):
Python\Python36-32\lib\http\client.py", line 279, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: <html>

This error only occurs whenever I try and use the variable names assigned to the data in the text file, and try to insert them into the URL using the % method. When I enter the values directly into the URL string, instead of using the variable names, the expected result occurs. Therefore, I have reason to believe that the problem is to do with the variables themselves, and the values, not the actual data which is valid.

  • 1
    Possible duplicate of [How to urlencode a querystring in Python?](https://stackoverflow.com/questions/5607551/how-to-urlencode-a-querystring-in-python) – udiboy1209 Mar 24 '18 at 21:51

3 Answers3

1

You can requests library

import requests

page_url = "https://www.metcheck.com/WEATHER/now_and_next.asp? 
zipcode=%s+%s&lat=%s&lon=%s" % (Post, Town, Lat, Long)

r = requests.get(page_url)
jujuBee
  • 494
  • 5
  • 13
1

To solve your newline problem, consider storing the info also as a JSON file. This will make parsing much easier, and this is what it was designed for! It will also allow you to add features to your program if you wish to do so in the future.

This is less related to your question, OP. But scraping HTML data from a webpage isn't recommended. I don't know how you parse your data, but if the website's design changes, it might hurt your parser.

A better approach would be to look for an API. Which metcheck has. More info here . More accurately, this (look under JSON URL).

import json

json_data = ""
with open("test.json") as json_file:
    json_data = json.load(json_file)

print(json_data["zipcode"]) # prints the zip code.

Example for your site:

import requests
import json
json_data = requests.get("http://ws1.metcheck.com/ENGINE/v9_0/json.asp?lat=51.8&lon=-0.1&lid=60357&Fc=No").text
first_day = json.loads(json_data)["metcheckData"]["forecastLocation"]["forecast"][0]
print(first_day["weekday"]) # print the first day of the first forecast.
print(first_day["temperature"]) # print the temperature of the first day.
Alex Osheter
  • 589
  • 5
  • 22
0

Found a way of doing it:

Using .format, to insert the values into the URL string, and then pass that as an argument of urllib.request.urlopen(*args)

file = open("LocationInfo.txt", 'r')
line = file.readlines()
savedDetails = line[0]

listDetails = savedDetails.split(',')
url= "https://www.metcheck.com/WEATHER/now_and_next.asp?zipcode={}&lat= 
{}&lon={}"
page_url = url.format(listDetails[1], listDetails[2], listDetails[3])
print(page_url)
soup = BeautifulSoup(urllib.request.urlopen(page_url), "lxml")

I believe the reason why the error was occuring was because the values were written to the textfile with 'newline' being used at the end of each entry. This meant that when URL tried to format the data from the textfile, it read the newline entries as well, meaning that the URL was invalid. I solved this by simply changing the code so that the data was written on one line of the textfile, and seperated with comman. Then the .split function was used to separate each part of data so it was formed into a list, then simply passed each element from the list into the URL. Quite Hacky but it does the job.

file = open("LocationInfo.txt", 'w')
file.write(str(userPostcode + ","))
file.write(str(userCity + ","))
file.write(str(userLat + ","))
file.write(str(userLong+ ","))
file.close()