I want to start by saying I have no experience working with XML files. I just started poking around this data in the past couple of days and have some code that somewhat works, it just needs a couple tweaks. The code I have is here:
import pandas as pd
import numpy as np
import xml.etree.ElementTree as ET
#parses the orders.xml file and stores in the "tree" variable
tree = ET.parse("orders (1).xml")
#gets the root of the parsed tree
root = tree.getroot()
#creating lists to store xml data in
OrderID = []
CustomerID = []
SKU = []
Price = []
Quantity = []
#goes through the xml file looking for the specified tag and stores the value into the list created above
for orderid in root.iter('OrderId'):
OrderID.append(orderid.text)
for customerid in root.iter('CustomerId'):
CustomerID.append(customerid.text)
for sku in root.iter('Sku'):
SKU.append(sku.text)
for price in root.iter('PriceInclTax'):
Price.append(price.text)
for quantity in root.iter('Quantity'):
Quantity.append(quantity.text)
# writes the data stored in the lists above to a pandas dataframe
df = pd.DataFrame(
list(zip(OrderID, CustomerID, SKU, Price, Quantity)), columns=['Order ID', 'Customer ID', 'SKU', 'Price', 'Quantity']
)
# prints the dataframe
print(df)
Here is the result:
Order ID Customer ID SKU Price Quantity
0 518 18321022 2BF207HT750-150 83.2300 8
1 517 18265518 4BF209HT750-175 107.5200 2
2 516 18344682 HUCP211-35KX 560.0000 2
3 515 18344503 6206-2RC4KX 58.8300 1
4 514 18352425 6206-ZZC4KX 47.8900 13
5 513 18265596 HUCF206-19KX 242.0000 8
6 512 18352395 4BF209HT750-175 352.7800 10
7 511 18266176 4BF209SHT750-175P 825.7200 10
8 510 18351938 SUCF208-24 28251299.4606 4
9 509 18265518 125855 567.4500 1
10 508 18351852 6003-2RC4KX 73.4686 1
11 506 18350111 4BF212HT750-250 152.4800 4
12 505 18327149 6200-2RSH 4.4600 6
13 504 18351172 6205-2RC4KX 68.0700 1
14 503 18266179 6203-2RC4KX 62.3800 1
15 502 18266179 4BF210HT750-200 162.2300 6
16 501 18265517 6204-ZZC4KX 34.8400 1
17 500 18265548 CG3-7000-NE 0.0000 1
18 499 18348139 CG3-7000-E 0.0000 1
19 498 18321909 CG3-7000-E 8920.0000 1
20 497 18337101 CG3-7000-NE 10770.0000 1
21 495 18348878 4BF207HT750-150 281.6700 24
22 494 18348911 UCF211-35 185.4930 6
23 493 18321909 HUC207-23KX 276.3900 2
24 492 18265594 4BF316SHT100-400P 1801.3900 20
25 491 18265533 2PB316SHT750-400P 1651.4400 1
26 490 18347438 SUCF205-16 134.8800 2
The issue I am having is some of these orders have multiple items on it, and this is not reflecting that. The item data for each order is stored in the tag and then . The code I have does not pick up that there can be multiple 's for each . Here is the source code of the XML document for Order ID 515
<Order>
<OrderId>515</OrderId>
<OrderGuid></OrderGuid>
<StoreId>1</StoreId>
<CustomerId></CustomerId>
<OrderStatusId></OrderStatusId>
<PaymentStatusId></PaymentStatusId>
<ShippingStatusId></ShippingStatusId>
<CustomerLanguageId></CustomerLanguageId>
<CustomerTaxDisplayTypeId></CustomerTaxDisplayTypeId>
<CustomerIp></CustomerIp>
<OrderSubtotalInclTax>681.4000</OrderSubtotalInclTax>
<OrderSubtotalExclTax>681.4000</OrderSubtotalExclTax>
<OrderSubTotalDiscountInclTax>0.0000</OrderSubTotalDiscountInclTax>
<OrderSubTotalDiscountExclTax>0.0000</OrderSubTotalDiscountExclTax>
<OrderShippingInclTax>0.0000</OrderShippingInclTax>
<OrderShippingExclTax>0.0000</OrderShippingExclTax>
<PaymentMethodAdditionalFeeInclTax>0.0000</PaymentMethodAdditionalFeeInclTax>
<PaymentMethodAdditionalFeeExclTax>0.0000</PaymentMethodAdditionalFeeExclTax>
<TaxRates>0.00:0.00; </TaxRates>
<OrderTax>0.0000</OrderTax>
<OrderTotal>681.4000</OrderTotal>
<RefundedAmount>0.0000</RefundedAmount>
<OrderDiscount>0.0000</OrderDiscount>
<CurrencyRate>1.0000</CurrencyRate>
<CustomerCurrencyCode>USD</CustomerCurrencyCode>
<AffiliateId>0</AffiliateId>
<AllowStoringCreditCardNumber>False</AllowStoringCreditCardNumber>
<CardType />
<CardName />
<CardNumber />
<MaskedCreditCardNumber />
<CardCvv2 />
<CardExpirationMonth />
<CardExpirationYear />
<PaymentMethodSystemName>Payments.PurchaseOrder</PaymentMethodSystemName>
<AuthorizationTransactionId />
<AuthorizationTransactionCode />
<AuthorizationTransactionResult />
<CaptureTransactionId />
<CaptureTransactionResult />
<SubscriptionTransactionId />
<PaidDateUtc>03/08/2023 21:29:31</PaidDateUtc>
<ShippingMethod>UPS Next Day Air®</ShippingMethod>
<ShippingRateComputationMethodSystemName>Shipping.Director</ShippingRateComputationMethodSystemName>
<CustomValuesXml></CustomValuesXml>
<VatNumber />
<Deleted>False</Deleted>
<CreatedOnUtc>3/8/2023 9:29:31 PM</CreatedOnUtc>
<OrderItems>
<OrderItem>
<Id>633</Id>
<OrderItemGuid>1cb035ca-73c8-4124-94d9-c14c54b669d8</OrderItemGuid>
<Name>30mm 6206 High Temperature 220 °C Sealed C4 Bearing</Name>
<Sku>6206-2RC4KX</Sku>
<PriceExclTax>58.8300</PriceExclTax>
<PriceInclTax>58.8300</PriceInclTax>
<Quantity>1</Quantity>
<DiscountExclTax>0.0000</DiscountExclTax>
<DiscountInclTax>0.0000</DiscountInclTax>
<TotalExclTax>58.8300</TotalExclTax>
<TotalInclTax>58.8300</TotalInclTax>
</OrderItem>
<OrderItem>
<Id>634</Id>
<OrderItemGuid>1a4a005f-e669-4420-9379-611e5b3f637c</OrderItemGuid>
<Name>30mm 6206 High Temperature 250 °C Shielded ZZ C4 Bearing</Name>
<Sku>6206-ZZC4KX</Sku>
<PriceExclTax>47.8900</PriceExclTax>
<PriceInclTax>47.8900</PriceInclTax>
<Quantity>13</Quantity>
<DiscountExclTax>0.0000</DiscountExclTax>
<DiscountInclTax>0.0000</DiscountInclTax>
<TotalExclTax>622.5700</TotalExclTax>
<TotalInclTax>622.5700</TotalInclTax>
</OrderItem>
</OrderItems>
<Shipments>
<Shipment>
<ShipmentId>391</ShipmentId>
<TrackingNumber>1Z4054470191053867</TrackingNumber>
<TotalWeight>6.1600</TotalWeight>
<ShippedDateUtc>3/8/2023 9:31:38 PM</ShippedDateUtc>
<DeliveryDateUtc />
<CreatedOnUtc>03/08/2023 21:31:38</CreatedOnUtc>
</Shipment>
</Shipments>
</Order>
What I need help with is having the orders that have multiple line items on it, and being able to write that into the pandas dataframe. OrderID 515 should be the SKU from index 3 and index 4.
I was expecting there to be multiple lines for orders with more than one item, instead items from order 515 went to the next line with the next OrderID 516.
Here is the XML for 2 orders:
<Order>
<OrderId>516</OrderId>
<OrderGuid></OrderGuid>
<StoreId></StoreId>
<CustomerId></CustomerId>
<OrderStatusId>30</OrderStatusId>
<PaymentStatusId>30</PaymentStatusId>
<ShippingStatusId>30</ShippingStatusId>
<CustomerLanguageId>1</CustomerLanguageId>
<CustomerTaxDisplayTypeId>10</CustomerTaxDisplayTypeId>
<CustomerIp></CustomerIp>
<OrderSubtotalInclTax>1120.0000</OrderSubtotalInclTax>
<OrderSubtotalExclTax>1120.0000</OrderSubtotalExclTax>
<OrderSubTotalDiscountInclTax>0.0000</OrderSubTotalDiscountInclTax>
<OrderSubTotalDiscountExclTax>0.0000</OrderSubTotalDiscountExclTax>
<OrderShippingInclTax>0.0000</OrderShippingInclTax>
<OrderShippingExclTax>0.0000</OrderShippingExclTax>
<PaymentMethodAdditionalFeeInclTax>0.0000</PaymentMethodAdditionalFeeInclTax>
<PaymentMethodAdditionalFeeExclTax>0.0000</PaymentMethodAdditionalFeeExclTax>
<TaxRates>0.00:0.00;</TaxRates>
<OrderTax>0.0000</OrderTax>
<OrderTotal>1120.0000</OrderTotal>
<RefundedAmount>0.0000</RefundedAmount>
<OrderDiscount>0.0000</OrderDiscount>
<CurrencyRate>1.0000</CurrencyRate>
<CustomerCurrencyCode>USD</CustomerCurrencyCode>
<AffiliateId>0</AffiliateId>
<AllowStoringCreditCardNumber>False</AllowStoringCreditCardNumber>
<CardType />
<CardName />
<CardNumber />
<MaskedCreditCardNumber></MaskedCreditCardNumber>
<CardCvv2 />
<CardExpirationMonth />
<CardExpirationYear />
<PaymentMethodSystemName></PaymentMethodSystemName>
<AuthorizationTransactionId></AuthorizationTransactionId>
<AuthorizationTransactionCode></AuthorizationTransactionCode>
<AuthorizationTransactionResult>Approved (1: This transaction has been approved.)</AuthorizationTransactionResult>
<CaptureTransactionId></CaptureTransactionId>
<CaptureTransactionResult>Approved (1: This transaction has been approved.)</CaptureTransactionResult>
<SubscriptionTransactionId />
<PaidDateUtc>03/09/2023 14:43:09</PaidDateUtc>
<ShippingMethod>Other</ShippingMethod>
<ShippingRateComputationMethodSystemName>Shipping.Director</ShippingRateComputationMethodSystemName>
<CustomValuesXml />
<VatNumber />
<Deleted>False</Deleted>
<CreatedOnUtc>3/9/2023 2:39:57 PM</CreatedOnUtc>
<OrderItems>
<OrderItem>
<Id>635</Id>
<OrderItemGuid>9dac1add-310b-41f7-a5af-36ff680b46de</OrderItemGuid>
<Name>2-3/16" High Temperature 500 °F Mounted Pillow Block Bearing</Name>
<Sku>HUCP211-35KX</Sku>
<PriceExclTax>560.0000</PriceExclTax>
<PriceInclTax>560.0000</PriceInclTax>
<Quantity>2</Quantity>
<DiscountExclTax>0.0000</DiscountExclTax>
<DiscountInclTax>0.0000</DiscountInclTax>
<TotalExclTax>1120.0000</TotalExclTax>
<TotalInclTax>1120.0000</TotalInclTax>
</OrderItem>
</OrderItems>
<Shipments>
<Shipment>
<ShipmentId>392</ShipmentId>
<TrackingNumber>1ZB3B6620328442937</TrackingNumber>
<TotalWeight>17.0000</TotalWeight>
<ShippedDateUtc>3/9/2023 2:40:56 PM</ShippedDateUtc>
<DeliveryDateUtc />
<CreatedOnUtc>03/09/2023 14:40:55</CreatedOnUtc>
</Shipment>
</Shipments>
</Order>
<Order>
<OrderId>515</OrderId>
<OrderGuid></OrderGuid>
<StoreId>1</StoreId>
<CustomerId></CustomerId>
<OrderStatusId>30</OrderStatusId>
<PaymentStatusId>10</PaymentStatusId>
<ShippingStatusId>30</ShippingStatusId>
<CustomerLanguageId>1</CustomerLanguageId>
<CustomerTaxDisplayTypeId>10</CustomerTaxDisplayTypeId>
<CustomerIp></CustomerIp>
<OrderSubtotalInclTax>681.4000</OrderSubtotalInclTax>
<OrderSubtotalExclTax>681.4000</OrderSubtotalExclTax>
<OrderSubTotalDiscountInclTax>0.0000</OrderSubTotalDiscountInclTax>
<OrderSubTotalDiscountExclTax>0.0000</OrderSubTotalDiscountExclTax>
<OrderShippingInclTax>0.0000</OrderShippingInclTax>
<OrderShippingExclTax>0.0000</OrderShippingExclTax>
<PaymentMethodAdditionalFeeInclTax>0.0000</PaymentMethodAdditionalFeeInclTax>
<PaymentMethodAdditionalFeeExclTax>0.0000</PaymentMethodAdditionalFeeExclTax>
<TaxRates>0.00:0.00; </TaxRates>
<OrderTax>0.0000</OrderTax>
<OrderTotal>681.4000</OrderTotal>
<RefundedAmount>0.0000</RefundedAmount>
<OrderDiscount>0.0000</OrderDiscount>
<CurrencyRate>1.0000</CurrencyRate>
<CustomerCurrencyCode>USD</CustomerCurrencyCode>
<AffiliateId>0</AffiliateId>
<AllowStoringCreditCardNumber>False</AllowStoringCreditCardNumber>
<CardType />
<CardName />
<CardNumber />
<MaskedCreditCardNumber />
<CardCvv2 />
<CardExpirationMonth />
<CardExpirationYear />
<PaymentMethodSystemName>Payments.PurchaseOrder</PaymentMethodSystemName>
<AuthorizationTransactionId />
<AuthorizationTransactionCode />
<AuthorizationTransactionResult />
<CaptureTransactionId />
<CaptureTransactionResult />
<SubscriptionTransactionId />
<PaidDateUtc>03/08/2023 21:29:31</PaidDateUtc>
<ShippingMethod>UPS Next Day Air®</ShippingMethod>
<ShippingRateComputationMethodSystemName>Shipping.Director</ShippingRateComputationMethodSystemName>
<CustomValuesXml></CustomValuesXml>
<VatNumber />
<Deleted>False</Deleted>
<CreatedOnUtc>3/8/2023 9:29:31 PM</CreatedOnUtc>
<OrderItems>
<OrderItem>
<Id>633</Id>
<OrderItemGuid>1cb035ca-73c8-4124-94d9-c14c54b669d8</OrderItemGuid>
<Name>30mm 6206 High Temperature 220 °C Sealed C4 Bearing</Name>
<Sku>6206-2RC4KX</Sku>
<PriceExclTax>58.8300</PriceExclTax>
<PriceInclTax>58.8300</PriceInclTax>
<Quantity>1</Quantity>
<DiscountExclTax>0.0000</DiscountExclTax>
<DiscountInclTax>0.0000</DiscountInclTax>
<TotalExclTax>58.8300</TotalExclTax>
<TotalInclTax>58.8300</TotalInclTax>
</OrderItem>
<OrderItem>
<Id>634</Id>
<OrderItemGuid>1a4a005f-e669-4420-9379-611e5b3f637c</OrderItemGuid>
<Name>30mm 6206 High Temperature 250 °C Shielded ZZ C4 Bearing</Name>
<Sku>6206-ZZC4KX</Sku>
<PriceExclTax>47.8900</PriceExclTax>
<PriceInclTax>47.8900</PriceInclTax>
<Quantity>13</Quantity>
<DiscountExclTax>0.0000</DiscountExclTax>
<DiscountInclTax>0.0000</DiscountInclTax>
<TotalExclTax>622.5700</TotalExclTax>
<TotalInclTax>622.5700</TotalInclTax>
</OrderItem>
</OrderItems>
<Shipments>
<Shipment>
<ShipmentId>391</ShipmentId>
<TrackingNumber>1Z4054470191053867</TrackingNumber>
<TotalWeight>6.1600</TotalWeight>
<ShippedDateUtc>3/8/2023 9:31:38 PM</ShippedDateUtc>
<DeliveryDateUtc />
<CreatedOnUtc>03/08/2023 21:31:38</CreatedOnUtc>
</Shipment>
</Shipments>
</Order>