0

I am trying to plot a Seaborn Jointplot in Jupyter notebook. My dataset consists of 4,446,966 index (rows). but I can get output for the plot if selected around 5000 rows. If the complete data set is selected then it is processed for a long time but no response.

Python / Pandas /Seaborn / Matplotlib / Jupyter Notebook / Google Colabs / EDA / Feature Engineering Image_1 Image_2

  • 4.5 million rows x 29 columns is a pretty big dataframe. you can try increasing the default memory size as detailed [here](https://stackoverflow.com/questions/57948003/how-to-increase-jupyter-notebook-memory-limit) – Derek O Jan 17 '22 at 01:15
  • Another idea is to take some random subset of the data and plot those. You could start with 5000 random rows, and slowly increase that number, convincing yourself that the random subset is (or is not) a good representation of the complete data. See e.g. [this post](https://stackoverflow.com/questions/22258491/read-a-small-random-sample-from-a-big-csv-file-into-a-python-data-frame) about some approaches to take a random subset while reading a csv. – JohanC Jan 18 '22 at 09:59

0 Answers0