1

Using pandas and seaborn on a csv dataframe with 50 million cases to make some scatter matrix I noticed that the process times are really long, for convenience I made df.sample() on a part of the data and this reduced the process time. Considering the potential of apache spark I wanted to ask if it is possible to apply its speed to process all the 50 million data to create: scatter matrix, scatter plot, pairgrid etc. in seaborn. Taking information on this topic I saw that it is quite difficult to do this.

krishna Prasad
  • 3,541
  • 1
  • 34
  • 44
vins_26
  • 197
  • 10

0 Answers0