31

I've a large dataset comprises 10^5 data points. And now I'm considering the following question related to large dataset:

Is there any efficient way to visualize very large dataset? In my case I have a user set and each user has 10^3 items. There are 10^5 items in total. I want to show all the items for each user at a time to enable quick comparison between users. Some body suggests using a list, but I don't think a list is the only choice when dealing with this big dataset.

Note

I want to show all the items for each user at a time.

This means I want to show all the datapoints when click on a user, and when I click on two uses, I can compare the difference between there datapoints.

Community
  • 1
  • 1
SolessChong
  • 3,370
  • 8
  • 40
  • 67
  • What do you mean by "efficient"? – Lars Kotthoff Aug 15 '13 at 08:14
  • 1
    Time efficiency. When the data size becomes huge, the rendering takes quite a moment and dynamic layouts becomes impossible. – SolessChong Aug 15 '13 at 10:01
  • This is really vague. What exactly are you trying to do, what have you tried and why is it not working? – Anko - inactive in protest Aug 15 '13 at 10:02
  • You're saying that you want to show everything all the time. I don't think there's much scope for improving efficiency there since you would need to render everything. – Lars Kotthoff Aug 15 '13 at 10:05
  • @LarsKotthoff I think rendering a point is more efficient than rendering a rect. Building a static layout is more efficient than a dynamic. So I'm asking for suitable designs when dealing with large dataset. Though there's a lower bound for the computation effort, we can still compare different designs and choose a suitable one. – SolessChong Aug 15 '13 at 12:37
  • We can't compare without specific code/data. Apart from high-level things like the ones you've just mentioned, any answer will be speculative. – Lars Kotthoff Aug 15 '13 at 12:54

2 Answers2

52

The problem is not to render them. You could switch to canvas or webgl for the rendering part. You can find some examples of using canvas and X3DOM with D3 data-binding. But it will be slow because of the number of DOM objects, so it's better to keep them separated, as in this parallel coordinates example. This example also features progressive rendering to load and render all the data elements.

Keeping them in memory and manipulating them client-side is not a problem neither. D3 is often used with Crossfilter for quick data manipulation of "million or more records".

10^5 data points are just slightly too many points for SVG interactive rendering. But too many data points in a visualization is often a hint that you have the wrong level of abstraction or the wrong plotting strategy. A lot of points will probably overlap or visually fuse. So why not aggregate these shapes, for example using heatmap (color scale for number of overlapping points), binning (hexbin, histogram), or summarizing the dataset?

If what you want is an overview, and comparing datasets, you probably need an abstraction, like some statistics summarizing your dataset, then see a detail on-demand (semantic zoom, focus+context, drill-down).

jagwar3
  • 95
  • 1
  • 6
Biovisualize
  • 2,455
  • 21
  • 20
  • 14
    This nailed it: But too many data points in a visualization is often a hint that you have the **wrong level of abstraction** or the **wrong plotting strategy**. – David R. Jun 28 '17 at 20:46
  • Should rendering on Canvas/Webgl fix this problem, since there will not be DOM elements ? – albanx Nov 29 '19 at 17:23
  • SVG < Canvas < WebGL in terms of the ability to handle larger datasets. – zero_cool Mar 27 '22 at 18:21
1

Hardware accelerated graphics are a good tool with data visualization.

100000 items scatter chart visualization with LightningChart JS takes less than a second.

const { lightningChart } = lcjs

const data = new Array(100000).fill(0).map(_ => ({ x: Math.random(), y: Math.random() }))

const tStart = Date.now()
const chart = lightningChart().ChartXY({disableAnimations: true})
const scatterSeries = chart.addPointSeries()
  .setPointSize(1)
  .add(data)
  
requestAnimationFrame(() => {
  const tEnd = Date.now()
  chart.setTitle(`${data.length} points visualization ready in ${((tEnd-tStart)/1000).toFixed(3)}s`)
})
<head>
<script src="http://unpkg.com/@arction/lcjs@3.1.0/dist/lcjs.iife.js"></script>
</head>
Niilo Keinänen
  • 2,187
  • 1
  • 7
  • 12