I have a XML file of size 35 GB. I have tried using SQL Server xquery and python lxml to parse this file. SQL have 2 GB limit on xml data type and python throws Memory Error.
So, I decide to use Vb.Net to parse the file using the code.
I used DataSet to read xml file to avoid complex xpath query. But, this is also throwing out of memory error.
Try
Dim xmlFile As XmlReader
xmlFile = XmlReader.Create("D:\wcproduction.xml", New XmlReaderSettings())
Dim ds As New DataSet
ds.ReadXml(xmlFile)
Dim i As Integer
For i = 0 To ds.Tables(0).Rows.Count - 1
MsgBox(ds.Tables(0).Rows(i).Item(0).ToString)
Next
Catch ex As Exception
MsgBox(ex.Message)
End Try
This is sample xml data from the actual file.
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xsd:schema targetNamespace="urn:schemas-microsoft-com:sql:SqlRowSet1" xmlns:schema="urn:schemas-microsoft-com:sql:SqlRowSet1" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes" elementFormDefault="qualified">
<xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd"/>
<xsd:element name="wcproduction">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="api_st_cde" type="sqltypes:smallint" nillable="1"/>
<xsd:element name="api_cnty_cde" type="sqltypes:smallint" nillable="1"/>
<xsd:element name="api_well_idn" type="sqltypes:int" nillable="1"/>
<xsd:element name="pool_idn" type="sqltypes:int" nillable="1"/>
<xsd:element name="prodn_mth" type="sqltypes:smallint" nillable="1"/>
<xsd:element name="prodn_yr" type="sqltypes:int" nillable="1"/>
<xsd:element name="ogrid_cde" type="sqltypes:int" nillable="1"/>
<xsd:element name="prd_knd_cde" nillable="1">
<xsd:simpleType>
<xsd:restriction base="sqltypes:char" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="2"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="eff_dte" type="sqltypes:datetime" nillable="1"/>
<xsd:element name="amend_ind" nillable="1">
<xsd:simpleType>
<xsd:restriction base="sqltypes:char" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="1"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="c115_wc_stat_cde" nillable="1">
<xsd:simpleType>
<xsd:restriction base="sqltypes:char" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="1"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="prod_amt" type="sqltypes:int" nillable="1"/>
<xsd:element name="prodn_day_num" type="sqltypes:smallint" nillable="1"/>
<xsd:element name="mod_dte" type="sqltypes:datetime" nillable="1"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
<wcproduction xmlns="urn:schemas-microsoft-com:sql:SqlRowSet1">
<api_st_cde>30</api_st_cde>
<api_cnty_cde>5</api_cnty_cde>
<api_well_idn>20178</api_well_idn>
<pool_idn>10540</pool_idn>
<prodn_mth>7</prodn_mth>
<prodn_yr>1973</prodn_yr>
<ogrid_cde>12437</ogrid_cde>
<prd_knd_cde>G </prd_knd_cde>
<eff_dte>1973-07-31T00:00:00</eff_dte>
<amend_ind>N</amend_ind>
<c115_wc_stat_cde>F</c115_wc_stat_cde>
<prod_amt>53612</prod_amt>
<prodn_day_num>99</prodn_day_num>
<mod_dte>2015-04-07T07:31:00.173</mod_dte>
</wcproduction>
</root>
I need a solution that can read data from XML file of size 35 GB or more and transfer data to SQL Server database.
Answer: Since dataset object uses memory, it will bottleneck. So, try this solution Reading large XML file using XMLReader in VB.net