0

I have URLs like this:

https://www.oslobors.no/ob/servlets/excel?type=history&columns=TIME%2C+BUYER%2C+SELLER%2C+PRICE%2C+VOLUME%2C+TYPE&format[TIME]=dd.mm.YY%20hh:MM:ss&format[PRICE]=%23%2C%23%230.00%23%23%23&format[VOLUME]=%23%2C%23%230&header[TIME]=Statoil&header[BUYER]=Kj%C3%B8per&header[SELLER]=Selger&header[PRICE]=Pris&header[VOLUME]=Volum&header[TYPE]=Type&view=DELAYED&source=feed.ose.trades.INSTRUMENTS&filter=ITEM_SECTOR%3D%3DsSTL.OSE%26%26DELETED!%3Dn1&stop=now&start=1493935200000&ascending=true

I'm able to open it in Excel(remove an 'l' in 'tinyurll'):

Sub Get_File()
    Dim oXMLHTTP As Object: Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    Dim strURL As String: strURL = "http://tinyurll.com/api-create.php?url=https://www.oslobors.no/ob/servlets/excel?type=history&columns=TIME%2C+BUYER%2C+SELLER%2C+PRICE%2C+VOLUME%2C+TYPE&format[TIME]=dd.mm.YY%20hh:MM:ss&format[PRICE]=%23%2C%23%230.00%23%23%23&format[VOLUME]=%23%2C%23%230&header[TIME]=Statoil&header[BUYER]=Kj%C3%B8per&header[SELLER]=Selger&header[PRICE]=Pris&header[VOLUME]=Volum&header[TYPE]=Type&view=DELAYED&source=feed.ose.trades.INSTRUMENTS&filter=ITEM_SECTOR%3D%3DsSTL.OSE%26%26DELETED!%3Dn1&stop=now&start=1493935200000&ascending=true"
        With oXMLHTTP: .Open "GET", strURL, False: .send: End With

        strURL = oXMLHTTP.responseText

        With Workbooks: .Open strURL, IgnoreReadOnlyRecommended:=True: End With
End Sub

But I want to download the contents to a text file instead of to excel file using Python?

Zuffa
  • 3
  • 1
  • 5
  • Maybe share a bit more info on what your goal is? Is the goal to open the file programmatically from Python? Is it to open the file specifically as a text file for some reason? Typically opening an excel file as a csv is a pretty easy approach from Python – stephenlechner May 07 '17 at 13:42
  • I want to save the contents of the downloaded file to a text file(readable), using Python. – Zuffa May 08 '17 at 12:57

4 Answers4

0

Even though the file will be downloaded as an excel file (.xlsx, probably), I think you can still open and read it as a CSV file with Python (maybe take a look at this question to get more details on how). If this excel file has multiple sheets, that could end up being problematic though. If that is the case, you may need to use an extra library (like pandas) to manage the opening and capturing data from the excel file.

Having opened and read from the file, you can then just write to a new text file with whatever content you want to keep. This other question has some good info on how to do that.

If you only have one sheet in the file, the csv approach would work, and would look something like this:

(edited, change rb to rt with CSV open)

import csv

my_read_path = '/directory/some_excel_file.xlsx'
text_file = open('/directory/my_output.txt', "w")
with open(my_read_path, 'rt') as csv_file:
    csv_reader = csv.reader(csv_file)
    for line in list(csv_reader):
        text_file.write(line)  # assumes you want to write any line

text_file.close()

Reading with something like pandas will probly be more complicated, but will likely be a valuable learning experience anyway.

Community
  • 1
  • 1
stephenlechner
  • 1,836
  • 11
  • 11
  • I get this error "for line in list(csv_reader): _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)", when trying your csv-code? – Zuffa May 09 '17 at 08:20
  • Excel file has only one sheet. – Zuffa May 09 '17 at 08:31
  • Ah, weird. Apparently you need to use `rt` if of `rb` with the CSV open. I've somehow never run into that as a thing until now, but this question clears things up. I've updated the answer and example. http://stackoverflow.com/questions/8515053/csv-error-iterator-should-return-strings-not-bytes – stephenlechner May 09 '17 at 14:27
0

I managed to download to an '.xlsx' file using this:

import requests
import time
import csv

url = 'https://www.oslobors.no/ob/servlets/excel?type=history&columns=TIME%2C+BUYER%2C+SELLER%2C+PRICE%2C+VOLUME%2C+TYPE&format[TIME]=dd.mm.YY%20hh:MM:ss&format[PRICE]=%23%2C%23%230.00%23%23%23&format[VOLUME]=%23%2C%23%230&header[TIME]=Statoil&header[BUYER]=Kj%C3%B8per&header[SELLER]=Selger&header[PRICE]=Pris&header[VOLUME]=Volum&header[TYPE]=Type&view=DELAYED&source=feed.ose.trades.INSTRUMENTS&filter=ITEM_SECTOR%3D%3DsSTL.OSE%26%26DELETED!%3Dn1&stop=now&start=1493935200000&ascending=true'
file_name = 'C:\\Users\\AR\\Documents\\DownloadFile.xlsx'

while True:
    try:
        resp = requests.get(url)
        with open(file_name, 'wb') as output:
            output.write(resp.content)
            break
    except Exception as e:
        print(str(e))
        time.sleep(3)

Using the extension '.txt' in the 'file_name', gives me a file that starts with:

PK    L©J               _rels/.rels­’ÁjÃ0†ï}
£{ã´ƒ1FÝ^Æ ·2ºÐl%1I,c«[öö3»l
l°£ôýH»Ã4ê•RölªË·ÖÀóùq}*‡2ûÕ²’;³*Œ
t"ñ^ël;1W)”NÃiD)ejuDÛcKz[×·:}gÀªŽÎ@:º
¨3¦–ÄÀ4è7Nýs_ni¼GúM*7·ôÀö2R+á³
Zuffa
  • 3
  • 1
  • 5
0

Found a solution using 'Openpyxl'(i.e. can read from the excel file without opening Excel(an excel workbook):

from openpyxl import load_workbook #https://openpyxl.readthedocs.io/en/latest/index.html
import requests
import time

url = 'https://www.oslobors.no/ob/servlets/excel?type=history&columns=TIME%2C+BUYER%2C+SELLER%2C+PRICE%2C+VOLUME%2C+TYPE&format[TIME]=dd.mm.YY%20hh:MM:ss&format[PRICE]=%23%2C%23%230.00%23%23%23&format[VOLUME]=%23%2C%23%230&header[TIME]=Statoil&header[BUYER]=Kj%C3%B8per&header[SELLER]=Selger&header[PRICE]=Pris&header[VOLUME]=Volum&header[TYPE]=Type&view=DELAYED&source=feed.ose.trades.INSTRUMENTS&filter=ITEM_SECTOR%3D%3DsSTL.OSE%26%26DELETED!%3Dn1&stop=now&start=1493935200000&ascending=true'
file_name = 'DownloadFile.xlsx'

while True:
    try:
        resp = requests.get(url)
        with open(file_name, 'wb') as output:
            output.write(resp.content)
            break
    except Exception as e:
        print(str(e))
        time.sleep(3)

wb = load_workbook(file_name)
ws = wb['data']
for row in ws.rows:
    for cell in row:
        print(cell.value)
Zuffa
  • 3
  • 1
  • 5
0

This is a workable solution that downloads file, saves to excel file, reads from excel file and saves as readable text, to text file.

from openpyxl import load_workbook #https://openpyxl.readthedocs.io/en/latest/index.html
import requests
import time

url = 'https://www.oslobors.no/ob/servlets/excel?type=history&columns=TIME%2C+BUYER%2C+SELLER%2C+PRICE%2C+VOLUME%2C+TYPE&format[TIME]=dd.mm.YY%20hh:MM:ss&format[PRICE]=%23%2C%23%230.00%23%23%23&format[VOLUME]=%23%2C%23%230&header[TIME]=Statoil&header[BUYER]=Kj%C3%B8per&header[SELLER]=Selger&header[PRICE]=Pris&header[VOLUME]=Volum&header[TYPE]=Type&view=DELAYED&source=feed.ose.trades.INSTRUMENTS&filter=ITEM_SECTOR%3D%3DsSTL.OSE%26%26DELETED!%3Dn1&stop=now&start=1493935200000&ascending=true'
file_name = 'DownloadFile.xlsx'
sdv_file_name = 'DownloadFile.sdv'

while True:
    try:
        resp = requests.get(url)
        with open(file_name, 'wb') as output:
            output.write(resp.content)
            break
    except Exception as e:
        print(str(e))
        time.sleep(3)

wb = load_workbook(filename=file_name, read_only=True)
ws = wb['data']
ws.max_row = ws.max_column = None
with open(sdv_file_name, 'a') as output:
    for row in ws.rows:
        line = str(row[0].value)+';'+str(row[1].value)+';'+str(row[2].value)+';'+str(row[3].value)+';'+str(row[4].value)+';'+str(row[5].value)+'\n'
        output.write(line)
Zuffa
  • 3
  • 1
  • 5