I have a dictionary that contains a list of expanded classnames as the keys with each key pointing to an list containing the number of times the class appears in different jars and what jars it appears in.
For example:
classToJars = {
'com.sun.xml.ws.policy.PolicyMapKey.class' : [ 1, 'policy-2.3.1.jar'],
'com.sun.xml.ws.policy.PolicyMerger.class' : [ 1, 'policy-2.3.1.jar'],
'com.sun.xml.ws.policy.PolicyAssertion.class' : [ 1, 'policy-2.3.1.jar' ],
'com.sun.xml.bind.AccessorFactory.class' : [1, 'jaxb-impl-2.2.6.jar'],
'com.sun.xml.bind.AccessorFactoryImpl.class' : [1, 'jaxb-impl-2.2.6.jar'],
'com.sun.xml.bind.AnyTypeAdapter.class' : [1, 'jaxb-impl-2.2.6.jar' ],
'org.apache.mina.integration.jmx.IoSessionManager.class' : [1, 'mina-integration-jmx-1.1.7.jar'],
'org.apache.mina.integration.jmx.IoServiceManager.class' : [1, 'mina-integration-jmx-1.1.7.jar'],
'org.apache.log4j.Appender.class' : [2, 'log4j-1.2.14.jar', 'log4j-1.2.15.jar'],
'org.apache.log4j.AppenderSkeleton.class' : [2, 'log4j-1.2.14.jar', 'log4j-1.2.15.jar'],
'com.sun.activation.registries.LineTokenizer.class' : [1, 'activation-1.1.jar'],
'com.sun.activation.registries.LogSupport.class' : [1, 'activation-1.1.jar'],
'com.sun.istack.Builder.class' : [2, 'jaxb-impl-2.2.6.jar istack-commons-runtime-2.4.jar'],
'com.sun.istack.ByteArrayDataSource.class' : [2, 'jaxb-impl-2.2.6.jar istack-commons-runtime-2.4.jar'],
'com.reuters.rfa.ansipage.Page.class' : [1, 'rfa-7.2.0.E2.jar'],
'com.reuters.rfa.ansipage.PageUpdate.class' : [1, 'rfa-7.2.0.E2.jar'],
'org.apache.http.impl.io.AbstractMessageWriter.class' : [1, 'rfa-7.2.0.E2.jar'],
'org.apache.http.impl.io.ChunkedOutputStream.class' : [1, 'rfa-7.2.0.E2.jar']
}
The is a large dict with thousands of keys and values looped over a large set of jars. The idea is to to be able to fold the dict where if the values are the same, then fold it to the largest common substring.
For example: when I run the folding function, the above hash should be reduced to 4 lines as follows:
'com.sun.xml.ws.policy' : [ 1, 'policy-2.3.1.jar'],
'com.sun.xml.bind' : [1, 'jaxb-impl-2.2.6.jar'],
'org.apache.mina.integration.jmx' : [1, 'mina-integration-jmx-1.1.7.jar'],
'org.apache.log4j' : [2, 'log4j-1.2.14.jar', 'log4j-1.2.15.jar'],
'com.sun.activation.registries' : [1, 'activation-1.1.jar'],
'com.sun.istack' : [2, 'jaxb-impl-2.2.6.jar istack-commons-runtime-2.4.jar'],
'com.reuters.rfa.ansipage' : [1, 'rfa-7.2.0.E2.jar'],
'org.apache.http.impl.io' : [1, 'rfa-7.2.0.E2.jar'],
and so on.
since there is nothing common between com.reuters.rfa and org.apache.http, it will come back with an empty key if you go for largest common substring.
In a case like that, it should simply paste com.reuters.rfa and org.apache.http separately.
Any ideas on how to achieve this?