1

I have an .XML file (This is a LOG that my program made) with this text in it :

<?xml version="1.0" encoding="utf-8"?>
<PsnRecords>
  <PsnRecord>
    <Names></Names>
    <PsnUrl>http://gs2.ww.prod.dl.playstation.net/gs2/ppkgo/prod/CUSA05330_00/108/f_acb1a312a982305e284718898b3dade6afb395e6718d836b1d7b1e1aa1873800/f/EP0953-CUSA05330_00-BRAWLHALLAEUROPE-A0403-V0100-DP.pkg</PsnUrl>
    <LocalUrl>C:\Users\Betrisa\Desktop\Shared\EP0953-CUSA05330_00-BRAWLHALLAEUROPE-A0403-V0100-DP.pkg</LocalUrl>
    <isLixian>false</isLixian>
    <LixianUrl></LixianUrl>
  </PsnRecord>
  <PsnRecord>
    <Names></Names>
    <PsnUrl>http://gs2.ww.prod.dl.playstation.net/gs2/ppkgo/prod/CUSA05330_00/108/f_acb1a312a982305e284718898b3dade6afb395e6718d836b1d7b1e1aa1873800/f/EP0953-CUSA05330_00-BRAWLHALLAEUROPE-A0403-V0100.pkg?downloadId=0000015b&amp;du=000000000000015b00e26bd28904ee7f&amp;product=0187&amp;serverIpAddr=192.168.137.1&amp;r=00000000</PsnUrl>
    <LocalUrl></LocalUrl>
    <isLixian>false</isLixian>
    <LixianUrl></LixianUrl>
  </PsnRecord>
  <PsnRecord>
    <Names></Names>
    <PsnUrl>http://ic.97f46e00.060798.gs2.sonycoment.loris-e.llnwd.net/gs2/ppkgo/prod/CUSA05330_00/108/f_acb1a312a982305e284718898b3dade6afb395e6718d836b1d7b1e1aa1873800/f/EP0953-CUSA05330_00-BRAWLHALLAEUROPE-A0403-V0100.pkg?downloadId=0000015b&amp;du=000000000000015b00e26bd28904ee7f&amp;product=0187&amp;serverIpAddr=192.168.137.1&amp;r=00000001</PsnUrl>
    <LocalUrl></LocalUrl>
    <isLixian>false</isLixian>
    <LixianUrl></LixianUrl>
  </PsnRecord>
</PsnRecords>

I want to get all URL links and save them to a .TXT file. I tried in 2 ways but they did not work:

Way 1 : using Split (Result is : Url )

        private void button1_Click(object sender, EventArgs e)
        {
            string paths = Application.StartupPath + @"\DataFiles\DataHistory.xml";
            string resPaths = Application.StartupPath + @"\DataFiles\Links.txt";
            StreamWriter urlsWrite = File.CreateText(resPaths);


            var text = System.IO.File.ReadAllText(paths);
            var links = text.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("<PsnUrl>http://") || s.StartsWith("<PsnUrl>https://"));

            foreach (string s in links)
            {
            urlsWrite.WriteLine(s);     
            }
            
        }

Way 2 : using Regex (Result is Nothing !!)

        private void button1_Click(object sender, EventArgs e)
        {
            string paths = Application.StartupPath + @"\DataFiles\DataHistory.xml";
            string resPaths = Application.StartupPath + @"\DataFiles\Links.txt";
            StreamWriter urlsWrite = File.CreateText(resPaths);


            var text = System.IO.File.ReadAllText(paths);
            var regex = new Regex(@"\b(?:http?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
            MatchCollection mactches = regex.Matches(text);
            
            foreach (string matc in links)
            {
            text = text.Replace(matc.Value, "<PsnUrl>"+matc.Value+"</PsnUrl>");
            urlsWrite.WriteLine(mats);     
            }
        }

I want a .TXT file with clean URLs in it, like:

https://xxxxxxxxxxxxxx
http://xxxxxxxxxxxxxx
https://xxxxxxxxxxxxxx
https://xxxxxxxxxxxxxx
https://xxxxxxxxxxxxxx
https://xxxxxxxxxxxxxx

What am I doing wrong?

  • 1
    Use some proper means of parsing XML. Have a look [here](https://stackoverflow.com/questions/220867/how-to-deal-with-xml-in-c-sharp) to get started.) – sticky bit Aug 02 '20 at 05:34
  • When you asked this earlier today I suggested looking into *XPath*. As everyone else has suggested, treat XML as XML. It is designed to be easily parsed by XML parses. – Flydog57 Aug 02 '20 at 06:29
  • @Flydog57 i am new on this site ! and the Admins colsed my post cuz of the rules ! so thank you and others for the helps , you are right Parse XML is the easiest way – Reval Revaaliyan Aug 02 '20 at 07:59

1 Answers1

0

Way 0: parse XML properly

var doc = new XmlDocument();
doc.LoadXml(text);
foreach(var n in doc.SelectNodes("//PsnUrl/text()"))
    urlsWrite.WriteLine(n);

Your sample XML seems to be copied from a tree view. Here is the proper content. Note that &s are encoded as &amp;. If your source does not do so, you could replace them first, e.g. text.Replace("&", "&amp;").

<?xml version="1.0" encoding="UTF-8"?>
<PsnRecords>
    <PsnRecord>
        <Names/>
        <PsnUrl>http://gs2.ww.prod.dl.playstation.net/gs2/acpkgo/prod/CUSA00803_00/9/f_72955662ebee69bf3f1bbec8b1f1dfef1ed000acb6f96046b394d69fc8551fe4/f/UP0002-CUSA00803_00-CODAWDIGITALPACK.pkg?downloadId=000000ab&amp;serverIpAddr=87.248.195.254&amp;country=us&amp;downloadType=ob&amp;q=1817303785a54ecb464ab93233801c33225a5dae976d075973acb9669874c74b</PsnUrl>
        <LocalUrl></LocalUrl>
    </PsnRecord>
    <PsnRecord>
        <Names/>
        <PsnUrl>http://gs2.ww.prod.dl.playstation.net/gs2/appkgo/prod/CUSA00803_00/3/f_6ee0d43dc4ea9a53a9f3d83fe26c7afcfadca8d17795762ab81cb2ddc6086776/f/UP0002-CUSA00803_00-CODAW00000000000_0.pkg?downloadId=000000ac&amp;serverIpAddr=87.248.195.254&amp;country=us&amp;downloadType=ob&amp;q=1817303785a54ecb464ab93233801c33225a5dae976d075973acb9669874c74b</PsnUrl>
        <LocalUrl></LocalUrl>
    </PsnRecord>
</PsnRecords>

Unless the XML is malformed, avoid playing with strings yourself.

Way 1: you need to strip away <PsnUrl> and </PsnUrl>.

foreach (string s in links)
    urlsWrite.WriteLine(s.Replace("<PsnUrl>", string.Empty).Replace("</PsnUrl>", string.Empty));

Way 2: mactches, links, mats??? please post actual code that compiles. Your replace call is wrapping the URL with the tag!? That is contrary to what you want to achieve.

foreach (Match matc in mactches)
    urlsWrite.WriteLine(matc.Value);
TimTIM Wong
  • 788
  • 5
  • 16
  • Sorry, that was my mistake !! XML file edited by my duplicate remover ! any way i edited the XML example in first post – Reval Revaaliyan Aug 02 '20 at 07:52
  • This Parse XML not working ``` var paths = Application.StartupPath + @"\DataFiles\DataHistory.xml"; string resPaths = Application.StartupPath + @"\DataFiles\Links.txt"; StreamWriter urlsWrite = File.CreateText(resPaths); var doc = new XmlDocument(); doc.Load(paths); XmlNodeList nodeList; nodeList = doc.SelectNodes("PsnRecords/PsnRecords/PsnUrl"); foreach (var n in nodeList) urlsWrite.WriteLine(n); ``` – Reval Revaaliyan Aug 02 '20 at 08:07