I have big XML files (between 500MB and 1GB) and I'm trying to filter them in order to keep only nodes with some specified attributes, in this case Prod_id. I have about 10k Prod_id that I need to filter and currently XML contains about 60k items.
Currently I'm using XSL with node.js (https://github.com/fiduswriter/xslt-processor) but it's really slow (I never saw one of them finished in 30-40 minutes).
Is there a way to increase the speed of this process? XSL is not a requirement, I can use everything.
XML Example:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<products>
<Product Quality="approved" Name="WL6A6" Title="BeBikes comfort WL6A6" Prod_id="BBKBECOMFORTWL6A6">
<CategoryFeatureGroup ID="10030">
<FeatureGroup>
<Name Value="Dettagli tecnici" langid="5"/>
</FeatureGroup>
</CategoryFeatureGroup>
<Gallery />
</Product>
...
<Product Quality="approved" Name="WL6A6" Title="BeBikes comfort WL6A6" Prod_id="LAL733">
<CategoryFeatureGroup ID="10030">
<FeatureGroup>
<Name Value="Dettagli tecnici" langid="5"/>
</FeatureGroup>
</CategoryFeatureGroup>
<Gallery />
</Product>
</products>
XSL I'm using
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="
products/Product
[not(@Prod_id='CEESPPRIVAIPHONE4')]
...
[not(@Prod_id='LAL733')]"
/>
</xsl:stylesheet>
Thanks