1

Want to build a 3D Bar Chart using Mayavi (on my Asus Laptop Intel CoreTM i7-4510U CPU @ 2.00 GHz with 8 GBs de RAM, Windows 10) using a Jupyter Notebook (on a Python virtualenv) but I'm getting a grey screen.

Once the data was imported, I clicked in New > Python 3 and wrote

Mayavi build 3D bar chart

Used pandas' fast CSV parser, pandas.read_csv(), and once I ran line 4, I could see the memory usage increase to 88% of the capable using CleanMem Mini Monitor and got results in less than 1 minute.

Then, to build the bar chart

df1=df[[0]]
df2=df[[1]]
df3=df[[2]]
mlab.barchart(df1,df2,df3)

Unfortunately, I got this MemoryError

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-6-9736b00b5abc> in <module>
      2 df2=df[[1]]
      3 df3=df[[2]]
----> 4 mlab.barchart(df1,df2,df3)

c:\infovis\virtualenvs\dev\lib\site-packages\mayavi\tools\helper_functions.py in the_function(*args, **kwargs)
     35 
     36     def the_function(*args, **kwargs):
---> 37         return pipeline(*args, **kwargs)
     38 
     39     if hasattr(pipeline, 'doc'):

c:\infovis\virtualenvs\dev\lib\site-packages\mayavi\tools\helper_functions.py in __call__(self, *args, **kwargs)
     80             scene.disable_render = True
     81         # Then call the real logic
---> 82         output = self.__call_internal__(*args, **kwargs)
     83         # And re-enable the rendering, if needed.
     84         if scene is not None:

c:\infovis\virtualenvs\dev\lib\site-packages\mayavi\tools\helper_functions.py in __call_internal__(self, *args, **kwargs)
   1093         """ Override the call to be able to scale automatically the axis.
   1094         """
-> 1095         g = Pipeline.__call_internal__(self, *args, **kwargs)
   1096         gs = g.glyph.glyph_source
   1097         # Use a cube source for glyphs.

c:\infovis\virtualenvs\dev\lib\site-packages\mayavi\tools\helper_functions.py in __call_internal__(self, *args, **kwargs)
     90         the last object created by the pipeline."""
     91         self.store_kwargs(kwargs)
---> 92         self.source = self._source_function(*args, **kwargs)
     93         # Copy the pipeline so as not to modify it for the next call
     94         self.pipeline = self._pipeline[:]

c:\infovis\virtualenvs\dev\lib\site-packages\mayavi\tools\sources.py in vertical_vectors_source(*args, **kwargs)
   1356 
   1357     data_source = MVerticalGlyphSource()
-> 1358     data_source.reset(x=x, y=y, z=z, scalars=s)
   1359 
   1360     name = kwargs.pop('name', 'VerticalVectorsSource')

c:\infovis\virtualenvs\dev\lib\site-packages\mayavi\tools\sources.py in reset(self, **traits)
    306                 traits['u'] = traits['v'] = np.ones_like(s),
    307                 traits['w'] = s
--> 308         super(MVerticalGlyphSource, self).reset(**traits)
    309 
    310     def _scalars_changed(self, s):

c:\infovis\virtualenvs\dev\lib\site-packages\mayavi\tools\sources.py in reset(self, **traits)
    172 
    173         else:
--> 174             points = np.c_[x.ravel(), y.ravel(), z.ravel()].ravel()
    175             points.shape = (-1, 3)
    176             self.trait_set(points=points, trait_change_notify=False)

c:\infovis\virtualenvs\dev\lib\site-packages\numpy\lib\index_tricks.py in __getitem__(self, key)
    404                 objs[k] = objs[k].astype(final_dtype)
    405 
--> 406         res = self.concatenate(tuple(objs), axis=axis)
    407 
    408         if matrix:

<__array_function__ internals> in concatenate(*args, **kwargs)

MemoryError: Unable to allocate array with shape (153543233, 3) and data type int64

And the result was this

Result

Amit Yadav
  • 4,422
  • 5
  • 34
  • 79
Tiago Martins Peres
  • 14,289
  • 18
  • 86
  • 145
  • 1
    Are you _sure_ that a bar-chart is what you want? it kinda looks like the x and y coordinates are all 0 or 1, and in any case 100MM items is way too many for a bar chart. If what you really want is a histogram, or to sum the `df[[2]]` values for each x,y pair, then I think you'll want to do some of the data processing yourself before calling the display function. – ShapeOfMatter Oct 08 '19 at 13:09
  • Yes a bar-chart is what i want with that data and as many items. It might be a lot to ask for but if it's not possible using mayavi with my conditions, I hope to find other solution where it is possible. If worse comes to worse will have to consider something like sampling. – Tiago Martins Peres Oct 08 '19 at 13:27
  • 1
    Do I read correctly that `(df[[0]], df[[1]])` are your (x,y) coordinates, and `df[[2]]` is the height value? You've got a lot of duplicate (x,y)s; how are you hoping they'll be displayed? – ShapeOfMatter Oct 08 '19 at 14:11
  • Right. Here what's in x,y,z didn't matter as the only goal is to check if Mayavi could handle create a bar chart with that many records it into a bar chart. (If meaning was relevant, could have done an average of z and get (x,y) with (0,0), (0,1), (1,0), (1,1)). – Tiago Martins Peres Oct 08 '19 at 15:34
  • If you're going to combine values (using average or any other function), then that greatly affects how we approach the memory problem. If you're _not_ going to combine values, then we need a clearer explanation of what you want the output to be. It's unclear how you would physically render a bar chart with a hundred million bars, in 2D _or_ 3D. – ShapeOfMatter Oct 08 '19 at 15:51
  • Not gonna do the average (a simple GROUP BY), that would reduce the records to 4. About the output, it was explained already. I'll get a dataset as big where x,y,z makes more sense if you think that helps here. – Tiago Martins Peres Oct 08 '19 at 16:11
  • 1
    Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/200568/discussion-between-shapeofmatter-and-tiago-martins-peres). – ShapeOfMatter Oct 08 '19 at 16:40

1 Answers1

2

Due to constantly being out-of-memory I had to come up with a way to reduce the amount of data.

Inspired in Trifacta, I've decided to go with sampling (create a sample from the CSV file). The following are some of the possible samples I could product

Sampling

For simplification reasons, decided to go with random samples. Using Git Bash on Windows 10 I just ran a similar command (the number of rows might not be the same as the one used) as

shuf -n 10000 BIGFILE.csv > SAMPLEFILE.csv

Then the procedure to create the visualization was exactly the same except the name of the file and the result was the following

Mayavi 3D Bar Chart

Mayavi 3D Bar Chart

Mayavi 3D Bar Chart

Tiago Martins Peres
  • 14,289
  • 18
  • 86
  • 145