1

My goal is to get all the documents from an alfresco site with 100000 documents. I used OpenCmis libraries. My problem is that with this procedure I get a java.lang.OutOfMemoryError: Java heap space.

The total size of all documents on the site is: 500GB.

This is the code:

CmisObject cmisObject = session.getObjectByPath(path);
FolderImpl sitoFolder = (FolderImpl) cmisObject;
List<Tree<FileableCmisObject>> sitoFolderDescendants = sitoFolder.getDescendants(-1);

This is my stacktrace error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.newNode(HashMap.java:1742)
at java.util.HashMap.putVal(HashMap.java:630)
at java.util.HashMap.put(HashMap.java:611)
at org.apache.chemistry.opencmis.commons.impl.XMLWalker.handleExtensionLevel(XMLWalker.java:128)
at org.apache.chemistry.opencmis.commons.impl.XMLWalker.handleExtensionLevel(XMLWalker.java:161)
at org.apache.chemistry.opencmis.commons.impl.XMLWalker.handleExtensionLevel(XMLWalker.java:161)
at org.apache.chemistry.opencmis.commons.impl.XMLWalker.handleExtension(XMLWalker.java:112)
at org.apache.chemistry.opencmis.commons.impl.XMLWalker.walk(XMLWalker.java:58)
at org.apache.chemistry.opencmis.commons.impl.XMLConverter$18.read(XMLConverter.java:2198)
at org.apache.chemistry.opencmis.commons.impl.XMLConverter$18.read(XMLConverter.java:2188)
at org.apache.chemistry.opencmis.commons.impl.XMLWalker.walk(XMLWalker.java:56)
at org.apache.chemistry.opencmis.commons.impl.XMLConverter.convertObject(XMLConverter.java:1102)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseElement(AtomPubParser.java:332)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseEntry(AtomPubParser.java:284)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseFeed(AtomPubParser.java:243)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseChildren(AtomPubParser.java:372)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseElement(AtomPubParser.java:339)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseEntry(AtomPubParser.java:284)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseFeed(AtomPubParser.java:243)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseChildren(AtomPubParser.java:372)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseElement(AtomPubParser.java:339)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseEntry(AtomPubParser.java:284)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseFeed(AtomPubParser.java:243)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseChildren(AtomPubParser.java:372)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseElement(AtomPubParser.java:339)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseEntry(AtomPubParser.java:284)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseFeed(AtomPubParser.java:243)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseChildren(AtomPubParser.java:372)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseElement(AtomPubParser.java:339)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseEntry(AtomPubParser.java:284)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parseFeed(AtomPubParser.java:243)
at org.apache.chemistry.opencmis.client.bindings.spi.atompub.AtomPubParser.parse(AtomPubParser.java:109)
G.Dileo
  • 121
  • 1
  • 6

2 Answers2

3

Don't use getDescendants(-1)! If you really, really need getDescendants(), use an operation context that only selects the properties you need and turns off Allowable Actions and ACLs. See http://chemistry.apache.org/docs/cmis-samples/samples/operation-context/index.html .

Florian Müller
  • 3,215
  • 1
  • 14
  • 11
2

I do not think that it is a good idea to get all the nodes in the same time.

CMIS has several ways to paginate a query. With pagination you can retrieve a predefined number of documents at a time and then free the memory.

See for example Apache CMIS: Paging query result

Marco Altieri
  • 3,726
  • 2
  • 33
  • 47