Read pptx file content from a url

Question

I found this solution to read word file content from a url

from urllib.request import urlopen
from bs4 import BeautifulSoup
from io import BytesIO
from zipfile import ZipFile

file = urlopen(url).read()
file = BytesIO(file)
document = ZipFile(file)
content = document.read('word/document.xml')
word_obj = BeautifulSoup(content.decode('utf-8'))
text_document = word_obj.findAll('w:t')
for t in text_document:
    print(t.text)

Anyone know a similar way to process pptx files? I have seen several solutions but to read the file directly, not from a url.

If you can read the .pptx file from disk, and not from a url, dowload the file then use that solution. [Basic http file downloading and saving to disk in python?](http://stackoverflow.com/questions/19602931/basic-http-file-downloading-and-saving-to-disk-in-python) — billett, Apr 24 '17 at 13:59

score 0 · Answer 1 · answered Apr 24 '17 at 14:06

0

i don't know if it can help you but with urllib you obtain the content of the pptx (variable file), use cStringIO.StringIO(file) in function that read a pptx file path to simulate a file.

answered Apr 24 '17 at 14:06

Virginie

66
6

Read pptx file content from a url

1 Answers1