3

My company uses an internal management software for storing products. They want to transpose all the products in a MySql database so they can do available their products on the company website.

Notice: they will continue to use their own internal software. This software can exports all the products in various file format (including XML).

The syncronization not have to be in real time, they are satisfied to syncronize the MySql database once a day (late night).

Also, each product in their software has one or more images, then I have to do available also the images on the website.

Here is an example of an XML export:

<?xml version="1.0" encoding="UTF-8"?>
<export_management userid="78643">
    <product id="1234">
        <version>100</version>
        <insert_date>2013-12-12 00:00:00</insert_date>
        <warrenty>true</warrenty>
        <price>139,00</price>
        <model>
            <code>324234345</code>
            <model>Notredame</model>
            <color>red</color>
            <size>XL</size>
        </model>
        <internal>
            <color>green</color>
            <size>S</size>
        </internal>
        <options>
            <s_option>aaa</s_option>
            <s_option>bbb</s_option>
            <s_option>ccc</s_option>
            <s_option>ddd</s_option>
            <s_option>eee</s_option>
            <s_option>fff</s_option>
      ...
            <extra_option>ggg</extra_option>
            <extra_option>hhh</extra_option>
            <extra_option>jjj</extra_option>
            <extra_option>kkk</extra_option>
      ...
        </options>
        <images>
            <image>
                <small>1234_0.jpg</small>
            </image>
            <image>
                <small>1234_1.jpg</small>
            </image>
        </images>
    </product>
    <product id="5321">
    ...
    </product>
    <product id="2621">
    ...
    </product>
  ...
</export_management>

Some ideas for how can I do it?

Please let me know if my question is not clear. Thanks


EDIT: I used a SQL like this for each table to fill them with the XML datas:

LOAD XML LOCAL INFILE '/products.xml' INTO TABLE table_name ROWS IDENTIFIED BY '<tag_name>';

Then, checking the tables content I can see that the field "id" (primary key) automatically has mantained itself the same for each respective product row in each tables. That's correct and suprisingly awesome!

The problem now is for the parameter <options> because it contains sub-parameters with same name (<s_option> and <extra_option>). The values of these tags are always different (that is, there is no a specific list of values, they are inserted manually by an employee) and also I don't know how many are for each product. I read that storing them as an array is not so good but if it's the only simple solution I can get it.

Yair Nevet
  • 12,725
  • 14
  • 66
  • 108
Fred K
  • 13,249
  • 14
  • 78
  • 103
  • Well, my answer helped you? did you follow my steps? – Yair Nevet Jun 17 '14 at 10:51
  • 1
    Hey Fred, of course I don't know about your personal situation but especially if someone like @yair-nevet has gone out of their way trying to provide you with a high quality answer, it is good courtesy to bother logging in and awarding the full bonus. – flup Jun 18 '14 at 09:24

3 Answers3

7

The way that I would approach the problem in your case is:

  1. Create a respective set of corresponding tables in the database which in turn will represent the company's Product model by extracting the modelling from your given XML.

  2. Create and use a scheduled daily synchronization job, that probably will executes few SQL commands in order to refresh the data or introduce a new one by parsing the products XMLs into the created tables.

To be more practical about it all:

  • As for the database's tables, I can easily identify three tables to be created based on your XML, look at the yellow marked elements:

    1. Products
    2. ProductsOptions
    3. ProductsImages

Exported Products Diagram

(This diagram created based on an XSD that was generated from your XML)


All rest can be considered as regular columns in the Products table since they're constitutes a 1-1 relationship only.

Next, create the required tables in your database (you can use an XSD2DB Schema converter tool to create the DDL script, I did it manually):

companydb.products

CREATE TABLE companydb.products (
  Id INT(11) NOT NULL,
  Version INT(11) DEFAULT NULL,
  InsertDate DATETIME DEFAULT NULL,
  Warrenty TINYINT(1) DEFAULT NULL,
  Price DECIMAL(19, 2) DEFAULT NULL,
  ModelCode INT(11) DEFAULT NULL,
  ModelColor VARCHAR(10) DEFAULT NULL,
  Model VARCHAR(255) DEFAULT NULL,
  ModelSize VARCHAR(10) DEFAULT NULL,
  InternalColor VARCHAR(10) DEFAULT NULL,
  InternalSize VARCHAR(10) DEFAULT NULL,
  PRIMARY KEY (Id)
)
ENGINE = INNODB
CHARACTER SET utf8
COLLATE utf8_general_ci
COMMENT = 'Company''s Products';

companydb.productsimages

CREATE TABLE companydb.productimages (
  Id INT(11) NOT NULL AUTO_INCREMENT,
  ProductId INT(11) DEFAULT NULL,
  Size VARCHAR(10) DEFAULT NULL,
  FileName VARCHAR(255) DEFAULT NULL,
  PRIMARY KEY (Id),
  CONSTRAINT FK_productsimages_products_Id FOREIGN KEY (ProductId)
    REFERENCES companydb.products(Id) ON DELETE RESTRICT ON UPDATE RESTRICT
)
ENGINE = INNODB
AUTO_INCREMENT = 1
CHARACTER SET utf8
COLLATE utf8_general_ci
COMMENT = 'Products'' Images';

companydb.productsoptions

CREATE TABLE companydb.productoptions (
  Id INT(11) NOT NULL AUTO_INCREMENT,
  ProductId INT(11) DEFAULT NULL,
  Type VARCHAR(255) DEFAULT NULL,
  `Option` VARCHAR(255) DEFAULT NULL,
  PRIMARY KEY (Id),
  CONSTRAINT FK_producstsoptions_products_Id FOREIGN KEY (ProductId)
    REFERENCES companydb.products(Id) ON DELETE RESTRICT ON UPDATE RESTRICT
)
ENGINE = INNODB
AUTO_INCREMENT = 1
CHARACTER SET utf8
COLLATE utf8_general_ci;

  • As for the synchronisation job process to take place, you can easily create an MySql event and use the Event Scheduler to control it, I created the required event which is calling a stored-procedure that you'll find below (SyncProductsDataFromXML), look:

CREATE DEFINER = 'root'@'localhost' EVENT companydb.ProductsDataSyncEvent ON SCHEDULE EVERY '1' DAY STARTS '2014-06-13 01:27:38' COMMENT 'Synchronize Products table with Products XMLs' DO BEGIN SET @productsXml = LOAD_FILE('C:/MySqlXmlSync/products.xml'); CALL SyncProductsDataFromXML(@productsXml); END; ALTER EVENT companydb.ProductsDataSyncEvent ENABLE

Now the interesting part is taking place, here is the synchronization stored-procedure (note how the event above is calling it):

CREATE DEFINER = 'root'@'localhost'
PROCEDURE companydb.SyncProductsDataFromXML(IN productsXml MEDIUMTEXT)
BEGIN

  DECLARE totalProducts INT;
  DECLARE productIndex INT;

  SET totalProducts = ExtractValue(productsXml, 'count(//export_management/product)');

  SET productIndex = 1;
  WHILE productIndex <= totalProducts DO
        SET @productId = CAST(ExtractValue(productsXml, 'export_management/product[$productIndex]/@id') AS UNSIGNED);

        INSERT INTO products(`Id`, `Version`, InsertDate, Warrenty, Price, ModelCode, Model, ModelColor, ModelSize, InternalColor, InternalSize)
        VALUES(
                @productId,
                ExtractValue(productsXml, 'export_management/product[$productIndex]/version'),
                ExtractValue(productsXml, 'export_management/product[$productIndex]/insert_date'),
                CASE WHEN (ExtractValue(productsXml, 'export_management/product[$productIndex]/warrenty')) <> 'false' THEN 1 ELSE 0 END,
                CAST(ExtractValue(productsXml, 'export_management/product[$productIndex]/price') as DECIMAL),
                ExtractValue(productsXml, 'export_management/product[$productIndex]/model/code'),
                ExtractValue(productsXml, 'export_management/product[$productIndex]/model/model'),
                ExtractValue(productsXml, 'export_management/product[$productIndex]/model/color'),
                ExtractValue(productsXml, 'export_management/product[$productIndex]/model/size'),
                ExtractValue(productsXml, 'export_management/product[$productIndex]/internal/color'),
                ExtractValue(productsXml, 'export_management/product[$productIndex]/internal/size')
              );

        SET @totalImages = ExtractValue(productsXml, 'count(//export_management/product[$productIndex]/images/image)');
        SET @imageIndex = 1;
        WHILE (@imageIndex <= @totalImages) DO
          INSERT INTO productimages(ProductId, Size, FileName) VALUES(@productId, 'small', EXTRACTVALUE(productsXml, 'export_management/product[$productIndex]/images/image[$@imageIndex]/small'));
          SET @imageIndex = @imageIndex + 1;        
        END WHILE;

        SET @totalStandardOptions = ExtractValue(productsXml, 'count(//export_management/product[$productIndex]/options/s_option)');
        SET @standardOptionIndex = 1;
        WHILE (@standardOptionIndex <= @totalStandardOptions) DO
          INSERT INTO productoptions(ProductId, `Type`, `Option`) VALUES(@productId, 'Standard Option', EXTRACTVALUE(productsXml, 'export_management/product[$productIndex]/options/s_option[$@standardOptionIndex]'));
          SET @standardOptionIndex = @standardOptionIndex + 1;
        END WHILE;

        SET @totalExtraOptions = ExtractValue(productsXml, 'count(//export_management/product[$productIndex]/options/extra_option)');
        SET @extraOptionIndex = 1;
        WHILE (@extraOptionIndex <= @totalExtraOptions) DO
          INSERT INTO productoptions(ProductId, `Type`, `Option`) VALUES(@productId, 'Extra Option', EXTRACTVALUE(productsXml, 'export_management/product[$productIndex]/options/extra_option[$@extraOptionIndex]'));
          SET @extraOptionIndex = @extraOptionIndex + 1;        
        END WHILE;

        SET productIndex = productIndex + 1;

  END WHILE;
END

And you're done, this is the final expected results from this process: enter image description here


enter image description here


enter image description here

NOTE: I've commit the entire code to one of my GitHub's repositories: XmlSyncToMySql

UPDATE:

Because your XML data might be larger then the maximum allowed for a TEXT field, I've changed the productsXml parameter to a MEDIUMTEXT. Look at this answer which outlines the various text datatypes max allowed size: Maximum length for MYSQL type text

Community
  • 1
  • 1
Yair Nevet
  • 12,725
  • 14
  • 66
  • 108
  • Hi @Yair Nevet and thanks for answer. I fixed the wrong XML markup (you can delete your part of the answer that focus on that). Well, I created the tables as you wrote but there's something that's not clear to me: **1)** LOAD XML doesn't fill the "flatten fields" like `model_color`, `model_size`, `internal_color`, `internal_size`. **2)** Also, LOAD XML doesn't fill the ProductsOptions and ProductsImages tables! Could you give more details on that? thanks – Fred K Jun 10 '14 at 07:48
  • 1
    @FredK Are you using some server languages, like java, c#, php, python etc.? – Yair Nevet Jun 10 '14 at 13:53
  • Yair, the above issues are decently clear? Have you understand what I meant? Yes, I will probably develope the frontend with PHP. – Fred K Jun 10 '14 at 14:41
  • I would rather parse the XML files into a series of domain events and do just the same as you would do by CQRS... http://www.gridshore.nl/wp-content/uploads/cqrs_architecture.jpg It is much easier to develop and test it this way, than call SQLs directly from the parser... – inf3rno Jun 10 '14 at 17:05
  • @YairNevet: I get error with this line `LOAD XML LOCAL INFILE '/products.xml' INTO TABLE Products(Id, Version, InsertDate, Warrenty, Price) ROWS IDENTIFIED BY '';`. Here the error: `#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ROWS IDENTIFIED BY ''' at line 1`. Mysql 5.5.33. It only works if I remove the field (Id, Version, ...) – Fred K Jun 10 '14 at 17:27
  • @YairNevet: LOAD XML LOCAL INFILE doesn't expect the fields on INTO TABLE. Check the official docs: http://dev.mysql.com/doc/refman/5.5/en/load-xml.html and this answer http://stackoverflow.com/a/9057422/1252920 Have you tried your code? Is it works on your system? – Fred K Jun 10 '14 at 17:50
  • @inf3rno Hi, could you post an answer more detailed? I really don't know how to do what you advise. Thanks – Fred K Jun 10 '14 at 17:51
  • @FredK I've also tried to load the data using the `LOAD` command and it is indeed seems to be rather problematic. The other option is to write a small program using some server-side language, like Java to do it, from my expirience, it should be very easy. – Yair Nevet Jun 11 '14 at 06:44
  • @YairNevet: Could you post an answer about how to that with server-side language? I will use PHP... let me know – Fred K Jun 11 '14 at 13:27
  • PHP is not what you're looking for, but rather a language like java, bacause you need an executable to be run and not a web site.. – Yair Nevet Jun 11 '14 at 14:33
  • @FredK I meant I would do this with a server side language, but I see now you want to solve it with MySQL only... – inf3rno Jun 12 '14 at 08:27
  • @FredK Okay, now it's really done! Take **ALL** of my code again and you'll have a complete sync process. Just take on account that I'm only handling **new** products in my code, let me know if you need help with existing products updating. – Yair Nevet Jun 12 '14 at 22:40
  • @YairNevet: Oh my God! I'm only back now for work issues and... WOW! I'm looking at the entire code on github and it seems perfect... Tomorrow I will try all and let you know. For now a big thanks! – Fred K Jun 18 '14 at 16:10
  • I get error on line `DECLARE totalProducts INT;` when trying to create the stored procedure. I solved wrapping the procedure between `delimiter` code as: `delimiter // ...the_procedure... END ; // delimiter ;` – Fred K Jun 22 '14 at 14:33
  • @YairNevet: I'm near to the end. I tried to calling the procedure for load a small xml (10-15 products) and it works great. Then I tried to load a larger xml (~50 products, file size: 120 KB) and the procedure fails: it doesn't show any error and his behaviour is the same as with the smaller file but it doesn't insert any row. Later I tried to split the file into two XMLs and load them separately, and it works. But obviously I can't manually split everyday the final XML (that will be bigger than 120 KB). Any ideas? – Fred K Jun 22 '14 at 17:04
  • 1.make sure that you work with a valid xml. 2.You should make sure that you don't insert existing products and if you do then update them instead of insert them. – Yair Nevet Jun 22 '14 at 17:10
  • Solved by @YairNevet: use MEDIUMTEXT datatype in place of TEXT. – Fred K Jun 22 '14 at 21:56
1

As this smells like integration work, I would suggest a multi-pass, multi-step procedure with an interim format that is not only easy to import into mysql but which also helps you to wrap your mind around the problems this integration ships with and test a solution in small steps.

This procedure works well if you can flatten the tree structure that can or could be expressed within the XML export into a list of products with fixed named attributes.

  • query all product elements with an xpath query from the XML, iterate the result of products
  • query all product attributes relative to the context node of the product from the previous query. Use one xpath per each attribute again.
  • store the result of all attributes per each product as one row into a CSV file.
  • store the filenames in the CSV as well (the basenames), but the files into a folder of it's own
  • create the DDL of the mysql table in form of an .sql file
  • run that .sql file against mysql commandline.
  • import the CSV file into that table via mysql commandline.

You should get quick results within hours. If it turns out that products can not be mapped on a single row because of attributes having multiple values (what you call an array in your question), consider to turn these into JSON strings if you can not prevent to drop them at all (just hope you don't need to display complex data in the beginning). Doing so would be violating to target a normal form, however as you describe the Mysql table is only intermediate here as well, I would aim for simpleness of the data-structure in the database as otherwise queries for a simple and fast display on the website will create the next burden.

So my suggestion here basically is: Turn the tree structure into a (more) flat list for both simplification of transition and easier templating for display.

Having an intermediate format here also allows you to replay in case things are going wrong.

It also allows you to mock the whole templating more easily.

Alterantively is is also possible to store the XML of each project inside the database (keep the chunks in a second table so you can keep varchar (variable length) fileds out of the first table) and keep some other columns as (flat) reference columns to query against. If it's for templating needs, turning the XML into a SimpleXMLElement is often very nice to have it being a structured, non-primitive data-type as view object you can traverse and loop over options. Would work similar with JSON however keeping the XML would not break a format boundary and XML can also express more structure than JSON.

hakre
  • 193,403
  • 52
  • 435
  • 836
  • Hi @hakre If I could have a flatten XML, all this work would be more easier but I can't have a flatten XML. Is there a way to flatten an XML? What do you think about the YairNevet solution? It seems definitive. – Fred K Jun 18 '14 at 16:15
  • Xslt is one way you could do this. – flup Jun 27 '14 at 07:36
0

You're taking a very technology-centered approach to this. I think it's wise to start by looking at the functional specifications.

It helps to have a simple UML class diagram of the business class Product. Show its attributes as the business sees them. So:

  • How is Model related to Product? Can there be multiple Models for one Product or the other way around?
  • What kind of data is stored in the Internal element? In particular: How can Internal's color and size be different from Model's color and size?

And specifically about the web application:

  • Is the Web application the only application that will be interested in this export?
  • Should the web application care about versioning or simply display the last available version?
  • Which specific options are interesting to the web application. Like a discount property or vendor name property or others?
  • What should the Product details page look like, what data needs to be displayed where?
  • Will there be other pages on the website displaying product information, and what product information will they list?

Then you'll know which attributes need to be readily available to the web application (as columns in the Product table) and (perhaps) which ones may be simply stored in one big XML blob in the database.

flup
  • 26,937
  • 7
  • 52
  • 74
  • Hi @flup: The posted XML represents all the cases, so: 1) There is only one Model for each Product. 2) Internal element contains standard text data (varchar) and there are no relations between Internal's color/size and Model's color/size. – Fred K Jun 18 '14 at 16:30
  • About web application: 1) Yes. 2) Simply display the last available version. 3) All the options are interesting to the web application because my company wants to filter products (on the web front-end) based on any options 4) The Product details page will displays all the data. 5) All the product information have to be displayed in Product specific page and most of them in a sort page as filters. – Fred K Jun 18 '14 at 16:30
  • Have you seen the YairNevet solution? What do you think about it? I'm new to these works but it seems good. Let me know your opinion – Fred K Jun 18 '14 at 16:32
  • I still don't understand how one single product can have two different colors and sizes. Also: how will the filtering/searching work? Are all options going to be treated equal or do certain option strings have special meanings? – flup Jun 19 '14 at 07:11
  • Yes, I saw @yair-nevet 's solution and it is a nicely detailed specific answer to your question. I think you may be skipping a step ahead by not first coming up with some canonical model that reflects the business's view of the product rather than the product editor's internal technical model. This is why I thought I'd add this answer. – flup Jun 19 '14 at 07:15
  • Hi @flup: wait, the posted XML is a sample. Don't give attention to why a product have two different colors and sizes. The "real" XML is different. On the web application I will develope a form that reflects all the data in the XML. For example: I will have a checkbox for each option, multiple select input to choose color, size, price range, ... Basing on what the user select I will call an AJAX script that search and shows all the products that match the filters. – Fred K Jun 19 '14 at 07:42
  • Obviously there are data that I will not shows in the frontend (version, code) but they're few. I asked an opinion about yair-nevet answer because it is all by using MySql scripts. Is it efficient? Is maybe better working with a server-side language rather than database scripts? – Fred K Jun 19 '14 at 07:46
  • I do realize that the data is mocked up but the question remains: why are there two different administrations for color and size and how should filtering on color and size work? As for the checkboxes: Do the users want checkboxes on the search form for every single option and extra option ever typed into the product administration? – flup Jun 19 '14 at 10:21
  • The approach where MySql reads and processes the entire exported document is nice and simple and I'd simply give it a try to see how it performs. If the size of the export file is gigantic, you may find that reading it in its entirety is too memory intensive, and then you'd need a streaming solution. But simple is good, so try it out! – flup Jun 19 '14 at 10:26