0

I am scraping a table from a website and writing it to a csv file. The name of the file is correct, however I need the name of the sheet within the workbook to be "Raw_Data", rather than the file name. Here is what I have so far:

import urllib.request
import json
import re
import datetime


html = urllib.request.urlopen("https://www.wunderground.com/personal-weather-station/dashboard?ID=KNYSENEC1#history/tdata/s20171104/e20171104/mdaily").read().decode('utf8')
json_data = re.findall(r'pws_bootstrap:(.*?)\s+,\s+country\:', html, re.S)
data = json.loads(json_data[0])

nnow = datetime.datetime.now().date()
Filenamee = "seneca_weather_" + str(nnow)
filename = ('%s.csv' % Filenamee)
f = open(filename, "w")

for days in data['history']['days']:
    for obs in days['observations']:
        f.write(str(obs['date']['iso8601']) + "," + str(obs['temperature']) + "," + str(obs['pressure']) + "," + str(obs['wind_dir']) + "," + str(obs['wind_speed']) + "," + str(obs['precip_today']) + "\n")

I'm very new to both python and webscraping so sorry for the super broad question. Thanks

Djaenike
  • 1,645
  • 5
  • 21
  • 32
  • So you want to change the name of the `csv` sheet name? Cause `csv`s don't have sheet names, `excel` just displays the file name in the workbook as the sheet name. – foxyblue Nov 04 '17 at 20:52
  • To do that look at this https://stackoverflow.com/a/17684679/3407256 – foxyblue Nov 04 '17 at 20:56

1 Answers1

0

Great to see you are trying Python. As soon as you work with tables I recommend the pandas library. Check out the docs here: http://pandas.pydata.org/pandas-docs/stable/.

Here you have an answer using pandas and json_normalize.

import urllib.request
import json
import re
import datetime
from pandas.io.json import json_normalize

html = urllib.request.urlopen("https://www.wunderground.com/personal-weather-station/dashboard?ID=KNYSENEC1#history/tdata/s20171104/e20171104/mdaily").read().decode('utf8')
json_data = re.findall(r'pws_bootstrap:(.*?)\s+,\s+country\:', html, re.S)
data = json.loads(json_data[0])

nnow = datetime.datetime.now().date()
filename = "seneca_weather_{}.xlsx".format(nnow)

df = json_normalize(data['history']['days'])

cols = ["summary.date.iso8601","summary.temperature", 
        "summary.pressure","summary.wind_dir",
        "summary.wind_speed","summary.precip_today"]

df[cols].to_excel(filename,index=False,sheet_name=filename)

Output:

excel screenshot

If you want it in *.csv simply do:

df[cols].to_csv("output.csv",index=False)
Anton vBR
  • 18,287
  • 5
  • 40
  • 46