0

I'm using this tool to make a CSV from an ebay XML file.

I've removed all of the shipping destinations using regex in notepad++ since they were unneccessary. Then I removed (<sellingStatus>|</sellingStatus>) becasue I thought maybe the tree was too deep (I'm an XML noob). Still, the price doesn't appear on the CSV when it's done converting.

Sample of the XML below. Could someone explain why the price isn't getting picked up by the converter, and ideally suggest a regex operation to fix it please?

<?xml version='1.0' encoding='UTF-8'?>
<findItemsIneBayStoresResponse xmlns="http://www.ebay.com/marketplace/search/v1/services">
    <ack>Success</ack>
    <version>1.13.0</version>
    <timestamp>2016-08-03T06:19:01.316Z</timestamp>
    <searchResult count="97">
        <item>
            <itemId>152174882749</itemId>
            <title>Yolande' Butterfly Blue #' print of original Watercolour , A4 size.</title>
            <globalId>EBAY-GB</globalId>
            <primaryCategory>
                <categoryId>60437</categoryId>
                <categoryName>Paintings</categoryName>
            </primaryCategory>
            <galleryURL>http://thumbs2.ebaystatic.com/m/m_VMlLhEdNRYkHi-6hUvbLg/140.jpg</galleryURL>
            <viewItemURL>http://www.ebay.co.uk/itm/Yolande-Butterfly-Blue-print-original-Watercolour-A4-size-/152174882749</viewItemURL>
            <paymentMethod>PayPal</paymentMethod>
            <autoPay>false</autoPay>
            <postalCode>TQ110DD</postalCode>
            <location>Buckfastleigh,United Kingdom</location>
            <country>GB</country>
                <currentPrice currencyId="GBP">12.5</currentPrice>
                <convertedCurrentPrice currencyId="GBP">12.5</convertedCurrentPrice>
                <sellingState>Active</sellingState>
                <timeLeft>P16DT11H53M5S</timeLeft>
            <listingInfo>
                <bestOfferEnabled>true</bestOfferEnabled>
                <buyItNowAvailable>false</buyItNowAvailable>
                <startTime>2016-07-20T18:12:06.000Z</startTime>
                <endTime>2016-08-19T18:12:06.000Z</endTime>
                <listingType>StoreInventory</listingType>
                <gift>false</gift>
            </listingInfo>
            <isMultiVariationListing>false</isMultiVariationListing>
            <topRatedListing>false</topRatedListing>
        </item>
        <item>
            <itemId>152181312690</itemId>
            <title>'Peacock' print of original Watercolour by Yolande, A4 size.</title>
            <globalId>EBAY-GB</globalId>
            <primaryCategory>
                <categoryId>60437</categoryId>
                <categoryName>Paintings</categoryName>
            </primaryCategory>
            <galleryURL>http://thumbs3.ebaystatic.com/m/m42A7QDsUeNNpkQylOnmMmg/140.jpg</galleryURL>
            <viewItemURL>http://www.ebay.co.uk/itm/Peacock-print-original-Watercolour-Yolande-A4-size-/152181312690</viewItemURL>
            <paymentMethod>PayPal</paymentMethod>
            <autoPay>false</autoPay>
            <postalCode>TQ110DD</postalCode>
            <location>Buckfastleigh,United Kingdom</location>
            <country>GB</country>
                <currentPrice currencyId="GBP">12.5</currentPrice>
                <convertedCurrentPrice currencyId="GBP">12.5</convertedCurrentPrice>
                <sellingState>Active</sellingState>
                <timeLeft>P21DT16H47M12S</timeLeft>
            <listingInfo>
                <bestOfferEnabled>true</bestOfferEnabled>
                <buyItNowAvailable>false</buyItNowAvailable>
                <startTime>2016-07-25T23:06:13.000Z</startTime>
                <endTime>2016-08-24T23:06:13.000Z</endTime>
                <listingType>StoreInventory</listingType>
                <gift>false</gift>
            </listingInfo>
            <isMultiVariationListing>false</isMultiVariationListing>
            <topRatedListing>false</topRatedListing>
        </item>
Robin Andrews
  • 3,514
  • 11
  • 43
  • 111

2 Answers2

1

No, please please please - Don't use regex to mangle your XML. It's very bad news.. It creates brittle code, which can break horribly one day.

There are other solutions. IMO using a scripting language (like perl) to do the conversion in a way you specifically want, would be optimal.

Give us an idea of what your sample output would be (And some valid XML - yours is missing close tags), and I can give you an example.

But it would be something like this:

#!/usr/bin/env perl
use strict;
use warnings 'all'; 

use XML::Twig;

my $twig = XML::Twig -> new -> parsefile('your_file.xml'); 

my @columns_xpath = qw ( itemId title postalCode currentPrice listingType ); 

print join( ",", @columns_xpath ), "\n";

foreach my $item ( $twig -> get_xpath('.//item') ) {
    print join ",", (map { $item -> get_xpath(".//$_", 0 ) -> text } @columns_xpath), "\n";
}

It's really that simple, and gives you a lot more flexibility of structure of output. (Other languages can do much the same - I've just offered up perl because I like it, and this it a solid choice).

Note - I'm using xpath to search. In xpath - which is like regex, but specifically for XML - .// means "search beneath this branch".

So .//listingType will find ./listingInfo/listingType - this only works if there's only one though (because that 0 in he get_xpath says 'get the first instance only).

Given your data, that's going to work. But you could instead omit the // and do 'fully qualified' paths.

The output generated from the above is: (Although I had to add the 'close' tags at the end of the XML).

itemId,title,postalCode,currentPrice,listingType
152174882749,Yolande' Butterfly Blue #' print of original Watercolour , A4 size.,TQ110DD,12.5,StoreInventory,
152181312690,'Peacock' print of original Watercolour by Yolande, A4 size.,TQ110DD,12.5,StoreInventory,

So here's the flaw with a simple "stick commas in it" approach to the problem. For the output, we need to do 'proper' CSV - using another trivially available module:

#!/usr/bin/env perl
use strict;
use warnings 'all'; 

use XML::Twig;
use Text::CSV; 

my $twig = XML::Twig -> new -> parsefile ( 'input.xml');
open ( my $output, '>', "results.csv" ) or die $!; 

my @columns_xpath = qw ( itemId title postalCode currentPrice listingType ); 
my $csv = Text::CSV -> new ( {sep_char => ',', eol => "\n" }); 

$csv -> print ( $output, \@columns_xpath );

foreach my $item ( $twig -> get_xpath('.//item') ) {
    my @row = map { $item -> get_xpath(".//$_", 0 ) -> text } @columns_xpath;
    $csv -> print ( $output, \@row );
}
close ( $output );

This now gives us properly quoted CSV:

itemId,title,postalCode,currentPrice,listingType
152174882749,"Yolande' Butterfly Blue #' print of original Watercolour , A4 size.",TQ110DD,12.5,StoreInventory
152181312690,"'Peacock' print of original Watercolour by Yolande, A4 size.",TQ110DD,12.5,StoreInventory

But to answer your original question - is there any chance it's because it's sellingState not sellingStatus?

Community
  • 1
  • 1
Sobrique
  • 52,974
  • 7
  • 60
  • 101
0

Try following code uisng C#.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;


namespace ConsoleApplication6
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        const String CSV = @"c:\temp\test.csv";
        static void Main(string[] args)
        {
            XDocument doc = XDocument.Load(FILENAME);
            StreamWriter writer = new StreamWriter(CSV);

            Boolean firstRow = true;
            List<string> columnsNames = new List<string>();

            foreach(XElement item in doc.Descendants().Where(x => x.Name.LocalName == "item"))
            {
                List<string> columns = new List<string>();
                foreach (XElement element in item.Elements())
                {
                    if (element.Name.LocalName == "listingInfo")
                    {
                        foreach (XElement listing in element.Elements())
                        {
                            columns.Add(((string)listing).Trim());
                            if (firstRow)
                            {
                                columnsNames.Add(listing.Name.LocalName);
                            }
                        }
                    }
                    else
                    {
                        columns.Add(((string)element).Trim());
                        if (firstRow)
                        {
                            columnsNames.Add(element.Name.LocalName);
                        }
                    }
                }
                if (firstRow)
                {
                    writer.WriteLine(string.Join(",", columnsNames.ToArray()));
                    firstRow = false;
                }
                writer.WriteLine(string.Join(",", columns.ToArray()));
            }
            writer.Flush();
            writer.Close();

        }
    }
}
jdweng
  • 33,250
  • 2
  • 15
  • 20