0

i got a big xml file and i want to save each id, source & target in a stringlist to generate after successfull import to stringlists build a query to mysql.

heres a snippet of my xml:

xliff version="1.1">
 <file original="Xliff Demo" source-language="EN" target-language="DE" datatype="html">
 <header>
 <skl>
 <external-file uid="017dbcf0-c82c-11e2-ba2b-005056c00008" href="skl\simple.htm.skl"/>
 </skl>
 </header>
 <body>
 <trans-unit id="00ffmnpB5wBV5KFqBxuHLi4fwJvvuB">
 <source xml:lang="EN">1lnRUfBBeHtbS96uULSht42VNMN7XE4qt9JrOcWhtoTuhnbAQ9</source>
 <target xml:lang="DE">zZvOLJfLCy9oP5GQYfEqw5LAeC2ESAxRmVe1JyQdmJ1eG2jz1N</target>
 <note/></trans-unit>
 <trans-unit id="00kjUwy1rJ54bEGYp7XZvtBiY32pmj">
 <source xml:lang="EN">HXOQLUWkfJg206vRw8lyWhCWChOacVxbMukfQ0HUdNHSI18GG4</source>
 <target xml:lang="DE">8dsX38mezeZ0w0w37LI66CDRuI8gBD23zT5KR4iqYNv3IGUgH0</target>
 <note/></trans-unit>
 <trans-unit id="00kk3Af8SFpHyelAaYrgK58b9GbIDj">
 <source xml:lang="EN">wQFxZiCiRsSNWs20G4WXAmDBRdRL6fcrrJnCgtbiXGSfHzpYrT</source>
 <target xml:lang="DE">oFVTUdPkExOhISYofIImLsnVKd3NSZg32tyeP5iRxRZdmuYQDy</target>
 <note/></trans-unit>
 <trans-unit id="00Ky2dmDU9wGTWBnJxeL9b9gkts5UQ">
 <source xml:lang="EN">nHQcjAW02lWe0SyOhqGtyqUhpwQ8qgWX3rUynMRf4BDHfVdHOC</source>
 <target xml:lang="DE">0CURp1dcZydB1V2rEZ1lnOhmYufOYbrLbh84e1ZnALlzZPVq4F</target>
 <note/></trans-unit>
 <trans-unit id="00pMSFlBfA3bJ8Xy9I78wz6XisPYcV">
 <source xml:lang="EN">IuhtaVnZtF67nxKz5dbmuy8BEMTs2X1120FzDtIplKF2Me5AsQ</source>
 <target xml:lang="DE">1BGSJQDZBm4UW974pucnX3XHuYOQYpC7nTcIH01rbKlOkVi9bo</target>
 <note/></trans-unit>
 <trans-unit id="012w2kb2d1Lo6NbJLE0BawThzsSuCJ">
 <source xml:lang="EN">0RoniOGZ7V7WTF1YQg59B8jBhRxnLVXscC1LOGPzKPYRs76oIz</source>
 <target xml:lang="DE">gyw15fkHTni2aUGWI5qiPHEz8vsJJJsW4OOqKwGYL1qzfUVfLO</target>
 <note/></trans-unit>
...
..
..

So i try to save each entry of trans-unit id, source xml:lang"EN", target xml:lang="DE" in a seperate stringlist but only the values.

Thats my code:

{ -----------  Import Procedure ------------ }
procedure TForm2.Button2Click(Sender: TObject);
var
  xmlFile, idList, sourceList, targetList: TStringList; // StringListe wo die Xml Datei eingelesen wird
  i: Integer;
  id, source, target: String;
  idTmp, idTmp2, sourceTmp, sourceTmp2, targetTmp, targetTmp2: Integer;
begin
  try
    xmlFile := TStringList.Create;
    idList := TStringList.Create;
    sourceList := TStringList.Create;
    targetList := TStringList.Create;

    if OpenDialog1.Execute then
      xmlFile.LoadFromFile(OpenDialog1.FileName);

      {Debug}
        //ShowMessage(IntToStr(XmlFile.Count));   Ausgabe der Zeilenlänge
        //ShowMessage(XmlFile[8]);                // Ausgabe der Zeile 8
      {/Debug}

      for i := 0 to xmlFile.Count-1 do // Über alle Zeilen der StringList gehen und folgendes tun:
        begin // Code pro Zeile

          {id}
          idTmp  := Pos('<trans-unit id="', xmlFile.Strings[i])+16;  //  Sucht nach trans-unit id   (16 ist die Anzahl der Länge vom Suchstring in dem Fall trans-unit id 16 Stellen lang
          if idTmp > 5 then // Überprüfen ob was gefunden wurde (Ungleich 0)
          begin
            idTmp2 := Pos('"', xmlFile.Strings[i], idTmp); // Ermittelt die Position vom Ende des Strings (")
            idList.Add(Copy(xmlFile.Strings[i], idTmp, idTmp2-idTmp));
          end;

          {source}
          sourceTmp  := Pos('<source xml:lang="EN">', xmlFile.Strings[i])+22;
          if sourceTmp > 5 then // Überprüfen ob was gefunden wurde (Ungleich 0)
          begin
            sourceTmp2 := Pos('<', xmlFile.Strings[i], sourceTmp); // Ermittelt die Position vom Ende des Strings (")
            sourceList.Add(Copy(xmlFile.Strings[i], sourceTmp, sourceTmp2-sourceTmp));
          end;

          {target}
          targetTmp  := Pos('<target xml:lang="DE">', xmlFile.Strings[i])+22;
          if targetTmp > 5 then // Überprüfen ob was gefunden wurde (Ungleich 0)
          begin
            targetTmp2 := Pos('<', xmlFile.Strings[i], targetTmp); // Ermittelt die Position vom Ende des Strings (")
            targetList.Add(Copy(xmlFile.Strings[i], targetTmp, targetTmp2-targetTmp));
          end;
        end;

      StartPerformance;
      UniConnection1.Open;
  finally
    ListBox1.items.assign(idList);
    ListBox2.items.assign(sourceList);
    ListBox3.items.assign(targetList);
    ShowMessage('Import in StringListen fertiggestellt.');
    xmlFile.Free;
    idList.Free;
    sourceList.Free;;
    targetList.Free;
  end;
end;

But it's not working like i want. My problem is, that it saves empty lines too in the stringlist and other trash. I dont really find my error and its the first time im using this copy/pos function.

Heres a screenshot

enter image description here

What should i change to fixx my problem and only save the correct strings in my 3 stringlists?

J...
  • 30,968
  • 6
  • 66
  • 143
Hidden
  • 3,598
  • 4
  • 34
  • 57
  • 2
    already the first `if idTmp > 5` and similar will allways be true – bummi Jun 24 '13 at 14:41
  • 3
    No, that's not how to parse XML. – David Heffernan Jun 24 '13 at 15:52
  • Never ever try to parse XML yourself. See here why: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Jeroen Wiert Pluimers Jun 24 '13 at 20:01
  • @JeroenWiertPluimers - I think that's more specifically a jab at trying to do the job with regex... while perhaps somewhat less nebulously perilous than using regex, I agree that OP's solution remains a shiver-inducing ad-hoc approach that is sanely best avoided. – J... Jun 24 '13 at 23:34
  • I read that parsing answer much broader: if even a powerful ad-hoc approach cannot parse HTML or XML, what ad-hoc approach can? Answer for XML: none. You need an XML parser (like a DOM) to do it properly. – Jeroen Wiert Pluimers Jun 25 '13 at 04:29
  • 1
    @J... The approach in the code is no better than regex. Looking at the code, I cannot see a tokenizer, for a start. The sentiment of bobince's seminal answer applies equally here. – David Heffernan Jun 25 '13 at 07:26
  • 1
    @JeroenWiertPluimers - I agree, of course. I suppose the best analogy is that of toying with petrol vs playing with semtex. While both ill advised, the lesser of the two ad-hoc approaches is generally, by its own unwieldiness, at least more self-limiting in the degree of monster it can become before its author realizes the futility of the approach. I fully admit, however, that I may be woefully underestimating the resolve with which a determined madman might grow the above into a million line fiend of a program. – J... Jun 25 '13 at 10:29

2 Answers2

6

Maybe you should think about using the IXMLDocument interface to load the XML File into a data structure and fill your stringlists afterwards.

An example has been posted here: https://stackoverflow.com/a/8651934/2207071

Community
  • 1
  • 1
  • I still know of the existence of this component. But i should use a own function to read these & import. – Hidden Jun 24 '13 at 14:12
  • 1
    @Polymorphin - Any reason why? – J... Jun 24 '13 at 14:13
  • I wanna diff the import speed with the componenet and without and own function. – Hidden Jun 24 '13 at 14:14
  • 1
    @Polymorphin - this may be of interest, then : http://stackoverflow.com/a/9495243/327083 – J... Jun 24 '13 at 14:15
  • I think the line idTmp2 := Pos('"', xmlFile.Strings[i], idTmp); and similar ones are incorrect. The offset should be (idTmp + 1) or otherwise you will find the first '"' again. – sausagequeen Jun 24 '13 at 14:32
  • 1
    @Polymorphin - yes, the zip is offline but there is a good collection of performance data (and a number of good libraries to try) if you are interested in comparing the speed of XML libraries. Unless you really need maximum performance from a custom algorithm, generally I would think that using a good library is always a better alternative. It will generally be a more stable and robust solution than something custom made. – J... Jun 24 '13 at 14:51
3

Here :

idTmp  := Pos('<trans-unit id="', xmlFile.Strings[i])+16;  
if idTmp > 5 then 
  ...

idTmp will always be greater than 5 - you are adding 16 to it no matter what and it always returns a positive value (or zero if no match).

The simplest change here would be :

 idTmp  := Pos('<trans-unit id="', xmlFile.Strings[i]);  
 if idTmp > 0 then begin //Pos returns 0 if no match found
   idTmp := idTmp + 16;
   idTmp2 := PosEx('"', xmlFile.Strings[i], idTmp); 
   idList.Add(Copy(xmlFile.Strings[i], idTmp, idTmp2-idTmp));
 end;

The change for the other two blocks would follow in a similar way.

You'll notice that I used StrUtils.PosEx here for idTmp2 - I don't know how your code compiled using Pos for the second function...

Edit

Ok, it looks like Pos was changed in XE3 to include offset overloads. If performance is your objective here (as it seems from comments) you should probably have a read of this :

http://qc.embarcadero.com/wc/qcmain.aspx?d=111103

Additionally, which I think is probably quite important, this really is a terrible way to parse XML. I highly suggest you read through some source code from projects that do this already to get a better understanding of how you should approach the problem. Some examples might be :

J...
  • 30,968
  • 6
  • 66
  • 143
  • Note that [QualityCentral has now been shut down](https://community.embarcadero.com/blogs/entry/quality-keeps-moving-forward), so you can't access `qc.embarcadero.com` links anymore. If you need access to old QC data, look at [QCScraper](http://www.uweraabe.de/Blog/2017/06/09/how-to-save-qualitycentral/). – Remy Lebeau Jun 09 '17 at 18:00