Using pandas and seaborn on a csv dataframe with 50 million cases to make some scatter matrix I noticed that the process times are really long, for convenience I made df.sample()
on a part of the data and this reduced the process time. Considering the potential of apache spark
I wanted to ask if it is possible to apply its speed to process all the 50 million data to create: scatter matrix
, scatter plot
, pairgrid
etc. in seaborn
. Taking information on this topic I saw that it is quite difficult to do this.
Asked
Active
Viewed 525 times
1

krishna Prasad
- 3,541
- 1
- 34
- 44

vins_26
- 197
- 10