0

I am trying to utilize Seaborn to create a visualization.

Here is what I have thus far:

import os.path
directory = os.path.dirname(os.path.abspath(__file__)) 
import pandas as pd
import seaborn as sns
sns.set(style="whitegrid", color_codes=True)
tel = pd.read_csv('nyc.csv')
nyctel = sns.load_dataset(tel)
sns.stripplot(x="installation_id", y="mounting", hue="mounting", data=nyctel)

The official documentation for load_dataset is completely useless, so I found that someone had already asked a question about how it works here: https://stackoverflow.com/a/30337377/6110631

I followed the format listed in the answer and imported pandas so I could use a local file (saved in the same folder). When I run the program however, I get

IOError: File nyc.csv does not exist

If I use an absolute path I get

IOError: ('http protocol error', 0, 'got a bad status line', None) 

It seems the problem is with this line:

nyctel = sns.load_dataset(tel)

because if I omit this line and the line beneath it and add print tel beneath the pd.read_csv line then the program works and it prints out the contents of the file. Somehow load_dataset is not letting me use that file though!

I am using the exact same code as in the answer linked above. Why would this not work for this local file?

InterLinked
  • 1,247
  • 2
  • 18
  • 50
  • Do you file in same location as code? May be you can try adding absolute path on `pd.read_csv` – niraj May 25 '18 at 01:11
  • 1
    ok, did you look at the link, from the link you posted above **load_dataset looks for online csv files on https://github.com/mwaskom/seaborn-data.** if you go the link `nyc.csv` is not there. – niraj May 25 '18 at 01:34
  • @0p3n5ourcE Yes, but using pandas you can use local files – InterLinked May 25 '18 at 01:36
  • 1
    ok, may be I am confused then why do you need nyctel again, why not use tel? May be I will stop commenting after this. If you comment line `nyctel = sns.load_dataset(tel)` and only use `tel` in data, wouldn't it work? – niraj May 25 '18 at 01:38
  • 1
    You can always try slicing with small results something like `tel = pd.read_csv('nyc.csv')[:100]` for only 100 rows. – niraj May 25 '18 at 01:43
  • @0p3n5ourcE This runs instantly and without error but does not return any visualization. I guess this means my last line does not work: sns.stripplot(x=tel.installation_id, y=tel.number_of_phones, hue=tel.mounting, data=tel) – InterLinked May 25 '18 at 01:53
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/171734/discussion-between-0p3n5ource-and-interlinked). – niraj May 25 '18 at 01:53

2 Answers2

1

The load_dataset() is only necessary to create a Pandas DataFrames, out of an example database. In your case, you created a DataFrame whith pd.read_csv('nyc.csv'), so sns.load_dataset(tel) is unnecessary and not working.

Here is a quote from https://seaborn.pydata.org/introduction.html

Most code in the docs will use the load_dataset() function to get quick access to an example dataset. There’s nothing special about these datasets: they are just pandas dataframes, and we could have loaded them with pandas.read_csv() or built them by hand. Most of the examples in the documentation will specify data using pandas dataframes, but seaborn is very flexible about the data structures that it accepts.

MA Bisk
  • 80
  • 6
0

I'm posting this via mobile so it's not tested:

import pandas as pd
import seaborn as sns
import os.path

directory = os.path.dirname(os.path.abspath(__file__))
filename = 'nyc.csv'
file_path = os.path.join(directory, filename)
tel = pd.read_csv(file_path)

sns.set(style="whitegrid", color_codes=True)
nyctel = sns.load_dataset(tel)

sns.stripplot(x="installation_id", y="mounting", hue="mounting", data=nyctel)
thclpr
  • 5,778
  • 10
  • 54
  • 87