import urllib
link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.read()
print(myfile)
EDIT (2018-06-25): Since Python 3, the legacy urllib.urlopen() was replaced by urllib.request.urlopen() (see notes from https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen for details).
If you're using Python 3, see answers by Martin Thoma or i.n.n.m within this question: https://stackoverflow.com/a/28040508/158111 (Python 2/3 compat) https://stackoverflow.com/a/45886824/158111 (Python 3)
Or, just get this library here: http://docs.python-requests.org/en/latest/ and seriously use it :)
import requests
link = "http://www.somesite.com/details.pl?urn=2344"
f = requests.get(link)
print(f.text)
use below code. hope it helps
from pyspark.sql import SparkSession
import requests
link = "https://data.cityofnewyork.us/api/views/m6nq-qud6/rows.csv?accessType=DOWNLOAD&bom=true&format=true"
r = requests.get(link, allow_redirects=True)
open(r'C:\Users\Saurabh\Desktop\file.csv', 'wb').write(r.content)
spark = SparkSession.builder.master("local").appName("records").getOrCreate()
df = spark.read.format("csv").option("header","true")\
.option("inferSchema","true").load(r"C:\Users\Saurabh\Desktop\file.csv")
Please use updated code as requested in comments. You can generalize the path as well using pathlib library. Don't forget to import/pip install the pathlib library.
Note that i have used parent class/method to store the file in the directory where the .py file will be present by default. In any system, wherever will be the source(.py) will be saved, the file will get downloaded first over there and then will use same named file to load to your dataframe(df). Hope this helps :)
from pyspark.sql import SparkSession
import requests
import pathlib
fn = pathlib.Path(__file__).parent / 'file.csv'
link = "https://data.cityofnewyork.us/api/views/m6nq-qud6/rows.csv?accessType=DOWNLOAD&bom=true&format=true"
r = requests.get(link, allow_redirects=True)
open('file.csv', 'wb').write(r.content)
spark = SparkSession.builder.master("local").appName("records").getOrCreate()
df = spark.read.format("csv").option("header","true")\
.option("inferSchema","true").load("file.csv")