1

I have the following problem that when I have a very big EMF Model (>1G on heap) to serialise to a XML file it takes several hours. I've no idea if I'm doing something wrong that causes that long delay or if this is common to take that long. We have a lot of lists in the model but otherwise there are just a lot of objects which are graph nodes with a very long UUID and a few parameters which are mostly integers and further string values like names and so on.

That's an excerpt of my saving routine of my EMF model:

// Register the XMI resource factory
Resource.Factory.Registry reg = Resource.Factory.Registry.INSTANCE;
reg.getExtensionToFactoryMap().put(uri.fileExtension(), new XMIResourceFactoryImpl());

// Obtain a new resource set
ResourceSet resSet = new ResourceSetImpl();

// create a resource
Resource resource = resSet.createResource(uri);

// get resource content
EList<EObject> resourceContent = resource.getContents();

resourceContent.add(objectsToAdd);

// save to file
resource.save(ResourceAdder.createOptions());

That's how my options look like:

public static Map<?, ?> createOptions() {
    HashMap<String, Object> options = new HashMap<String, Object>();
    options.put(XMLResource.OPTION_ENCODING, "UTF-8"); //$NON-NLS-1$
    options.put(XMLResource.OPTION_CONFIGURATION_CACHE, Boolean.TRUE);
    options.put(Resource.OPTION_SAVE_ONLY_IF_CHANGED, Resource.OPTION_SAVE_ONLY_IF_CHANGED_MEMORY_BUFFER);
    return options;
}

So my question is if it is common to take that long to serialise a large EMF-Model? What do you suggest I could do to reduce the amount of time it takes to serialise the model. I already considered using Teneo and serialise the entire EMF-Model to a local Derby database but I haven't tested it yet if it would improve the runtime. Thanks for any pointers or suggestions you can provide.

I added a heap analysis made by VisualVM from a very small graph which still took several minutes to serialise. The final size of all XML-Files is 250MB. HeapAnalysisOfSmallGraph

tzwickl
  • 1,341
  • 2
  • 15
  • 31
  • Ideally speaking it shall not take so much time. Atleast I have never experienced such a pathetic performance. Have you ever waited for so long that it completes the execution. To say over night... If yes, What is the size of the generated xml file – Karthik Rocky Apr 07 '15 at 18:24
  • I remember I got better performances using binary format to serialize huge models instead of xml/xmi format (it also decreases the serialized file size). – Vincent Aranega Apr 07 '15 at 20:57
  • 1G of heap should not be a problem for the EMF serialization - we have very big models (more than 4G) and they take around 20 seconds to serialize to XMI. Did you already profiled to see where it's spending the time? Could it be the Garbage Collector that's running too much? Did you try also without the option Resource.OPTION_SAVE_ONLY_IF_CHANGED? – xsilmarx Apr 08 '15 at 10:35
  • @silmarx I added an image of my profiling for a small graph without the option Resource.OPTION_SAVE_ONLY_IF_CHANGED, but it still took several minutes ... – tzwickl Apr 08 '15 at 16:35
  • @KarthikRocky No I have never waited to finish the saving of the largest graph but I added the informations for a smaller one which I think is also very pathetic considering the small heap size ... – tzwickl Apr 08 '15 at 16:36
  • @tom1991te from the memory profiler it doesn't seem to be a garbage collector issue. I recommend to profile the CPU during the serialization, to see if there is some hotspot where the CPU is being spent – xsilmarx Apr 10 '15 at 15:53

0 Answers0