2

I am using seaborn for data visualization. But it fails over the sample data it has in documentation

import seaborn as sns
sns.set()
tips = sns.load_dataset("tips")

Traceback (most recent call last):
  File "databaseConnection.py", line 35, in <module>
    tips = sns.load_dataset("tips")
  File "C:\python3.7\lib\site-packages\seaborn\utils.py", line 428, in load_dataset
    urlretrieve(full_path, cache_path)
  File "C:\python3.7\lib\urllib\request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "C:\python3.7\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\python3.7\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)
  File "C:\python3.7\lib\urllib\request.py", line 543, in _open
    '_open', req)
  File "C:\python3.7\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\python3.7\lib\urllib\request.py", line 1360, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "C:\python3.7\lib\urllib\request.py", line 1319, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>

That's because I am behind a proxy, but how can I ask seaborn to use proxy?

garg10may
  • 5,794
  • 11
  • 50
  • 91

2 Answers2

7

You can download the file manually.

Use

import seaborn as sns
print(sns.utils.get_data_home())

to find out the folder for your seaborn data, e.g. it might come out as C:\Users\username\seaborn-data on windows.

Download the file https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv to that folder. Finally, use the "cached" option

sns.load_dataset("tips", cache=True)

Alternatively download the file to any other folder. Use that folder's pathname as data_home argument

sns.load_dataset(name, cache=True, data_home="path/to/folder")
ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • yes that is there, maybe proxy feature should be added in `seaborn` – garg10may Feb 04 '19 at 12:56
  • I guess providing the example data is not really a key feature of seaborn for which full-fletched support would be needed. It's rather a nice way to allow people to start using seaborn without external data, and to make examples reproducible. If you must, you can put the above steps into a custom function to use instead of `load_dataset`. – ImportanceOfBeingErnest Feb 04 '19 at 13:07
  • Currently (version 0.9.0) you need to avoid extension `.csv` from last line because it add it automatically and then it choose download it again (see https://github.com/mwaskom/seaborn/blob/v0.9/seaborn/utils.py#L417 ) You probably didn't see it because you were not behind proxy during your test. Maybe you could add `tips = pd.read_csv(sns.utils.get_data_home()+"/tips.csv")` line because load_dataset return pandas.DataFrame anyways. – Liso Apr 26 '19 at 08:29
  • @Liso Thanks, I removed `.csv` from the answer. However, `tips = pd.read_csv(sns.utils.get_data_home()+"/tips.csv")` won't give the same result as `sns.load_dataset` because seaborn changes the datatypes of certain columns in the returned dataframe (as seen [here](https://github.com/mwaskom/seaborn/blob/1489186d8e85d2139a6094a94947611ee68cbed0/seaborn/utils.py#L433)). – ImportanceOfBeingErnest Apr 26 '19 at 11:36
  • @ImportanceOfBeingErnest you are right same type is not enough here. Thx for check it. – Liso Apr 28 '19 at 07:48
  • @ImportanceOfBeingErnest using get_data_home i am getting data but how to change this path to my current path ? – Sujit Dhamale Jun 12 '19 at 10:56
0

I understand, question is bit old. But I was looking for the similar kind of solution, that I didn't get working for me (somehow) mentioned above. So, I created similar/duplicate question at below link:

Not able to resolve issue(HTTP error 404) with seaborn.load_dataset function

And then I found my solution through debugging. Details is below:

load_dataset() is available in 'utils.py' library file, where path has below hard coded string:

path = ("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/{}.csv")

So, whatever file name we provide in load_dataset() function, python searches it online to above path. There is no option, where we can give our own online link for the dataset other than the above path. The second parameter of load_dataset() is 'cache' that has default boolean value as 'True'. So, if dataset not found online then the function will look into the physical path as below:

<Your Drive>:\Users\<Your User name>\seaborn-data 
    e.g. C:\Users\user1\seaborn-data    

This path should have our dataset if not found online. i.e. below code will work if we have dataset physically present:

df = sns.load_dataset('FiveYearData')

(Note: But if dataset is found online then, due to cache=True, it will be copied to above path as well.)

We can also provide different physical path for dataset through third parameter (data_home) as below:

df = sns.load_dataset('FiveYearData',data_home=os.path.dirname(os.path.abspath("FiveYearData")))

Here, I am taking my current project working directory to have my dataset.

WpfBee
  • 2,837
  • 6
  • 23
  • 29