5

Can you help me to list browsers from this file http://techpatterns.com/downloads/firefox/useragentswitcher.xml into txt file, separated by %tab% delimiter?

3 or 4 columns should be there:

1) folder description from example data: <folder description="Browsers - Windows">

2) browser type from example data: <folder description="Legacy Browsers">

3) user agent from example data:<useragent description="Avant Browser 1.2" useragent="Avant Browser/1.2.789rel1 (http://www.avantbrowser.com)" app

Here I see 1st problem, because some browsers arn't in folder <folder description="Legacy Browsers">" but under <separator/>

So the 1st column should define system, second is type and third is browser.

The next problem is that the Devises folder contains one more folder.

@echo off 
Setlocal EnableDelayedExpansion
SET file=useragentswitcher.xml
SET delim="

FOR /F "tokens=* skip=1" %%F IN (!file!) DO (
  REM echo %%F
  call :parse "%%F" > temp.txt
  FOR /F "tokens=1,2,3,4,5,6,7 skip=1 delims=" %%A IN (temp.txt) DO (
    IF "%%A"=="folder" (
      SET /A level=!level!+1
      echo Level:!level!
      ) ELSE IF "%%A"=="/folder" (
          SET /A level=!level!-1
          echo Level:!level!
        )

   echo A:%%A
  )
  pause
)

exit /b

:parse
Setlocal EnableDelayedExpansion
  SET A=%*
  REM REMOVE double paranthesis and <> 
  SET A=!A:~2,-2!
  REM replace double qoutes
  SET A=!A:"=µ!
  FOR /F "tokens=1,2 delims=µ=" %%A IN ("!A!") DO (
    SET first=%%A
    SET second=%%B
    echo !first!
    FOR /F "tokens=1,2 delims= " %%A IN ("!first!") DO (
      echo %%A
      echo %%B
    )
    echo !second!
  )
endlocal
exit /b

This parses one tag of the line and I am going to work with it now.

John Boe
  • 3,501
  • 10
  • 37
  • 71
  • Where your code fails? I can't see any programming question? – jeb Jun 13 '12 at 12:48
  • I will paste code, if I will have some. Now I am stuck here: `FOR /F "tokens=1,2 delims=^"" %%B IN ("%%A") DO` How should I use double quotes as delimiter? – John Boe Jun 13 '12 at 13:28
  • Code updated, this looks better to start to work with. – John Boe Jun 13 '12 at 15:30
  • Is it possible to add **linefeed as delim?** What I try to do is to parse file lines into variables `%%A %%B %%C %%D %%E %%F` – John Boe Jun 13 '12 at 16:03

3 Answers3

6

It seems you ought to be able to find a much better tool than batch to parse XML...

But I believe the code below is what you are looking for.

Because the number of folders varies, I swapped the order of the columns in the output. I put the browser description first, followed by the folders, one per column. This allows the definition of each column to be fixed.

I used the info in jeb's answer to include " as a FOR delimiter.

EDIT - I simplified the code

Note - This first attempt was written to work with a copy of the XML that was retrieved using Internet Explorer. I've since discovered that IE altered the format of the file. This code is highly dependent on the exact format of the file, so it will not work on the original XML. It also serves as an example as to why batch is a poor choice for parsing XML

@echo off
setlocal enableDelayedExpansion

::Define the files to use - change as needed
set input="test.xml"
set output="result.txt"

::The assignment below should have exactly one TAB character between = and "
set "TAB=   "

set cnt=0
set "folder0="
>%output% (
  for /f usebackq^ tokens^=1^,2^ delims^=^=^" %%A in (%input%) do (
    for %%N in (!cnt!) do (
      if "%%A"=="- <folder description" (
        set /a cnt+=1
        for %%M in (!cnt!) do set "folder%%M=!folder%%N!%TAB%%%B"
      )
      if "%%A"=="  </folder>" (
        set /a cnt-=1
      )
      if "%%A"=="  <useragent description" (
        echo %%B!folder%%N!
      )
    )
  )
)

The code will fail if ! appears in any of the descriptions because delayed expansion will corrupt expansion of any FOR variable that contains !. I checked, and your file does not contain ! in any description.

The code could be modified to handle ! in the description, but it would get more complicated. It requires toggling of delayed expansion on and off, and preservation of variable values across the ENDLOCAL barrier.

The above code is highly dependent on the format of the XML. It will fail if the non-standard dashes are removed, or if the white space arrangement changes.

The following variation is a bit more robust, but it still requires that each line contains exactly one XML tag.

@echo off
setlocal enableDelayedExpansion

::Define the files to use - change as needed
set input="test.xml"
set output="result.txt"

::The assignment below should have exactly one TAB character between = and "
set "TAB=   "

set cnt=0
set "folder0="
>%output% (
  for /f usebackq^ tokens^=1^,2^ delims^=^=^" %%A in (%input%) do (
    for %%N in (!cnt!) do (
      set "test=%%A"
      if "!test:<folder description=!" neq "!test!" (
        set /a cnt+=1
        for %%M in (!cnt!) do set "folder%%M=!folder%%N!%TAB%%%B"
      )
      if "!test:</folder>=!" neq "!test!" (
        set /a cnt-=1
      )
      if "!test:<useragent description=!" neq "!test!" (
        echo %%B!folder%%N!
      )
    )
  )
)

EDIT - One last version

Here is a version that can handle ! in the data. I've added an additional column to the output. The first column is still the browser description. The 2nd column is the useragent string. The remaining columns are the folders. The solution uses the delayed expansion toggling technique. It also uses an additional FOR /F to preserve a variable value across the ENDLOCAL barrier.

@echo off
setlocal disableDelayedExpansion

::Define the files to use - change as needed
set input="test.xml"
set output="result.txt"

::The assignment below should have exactly one TAB character between = and "
set "TAB=   "

set cnt=0
set folder0=""
>%output% (
  for /f usebackq^ tokens^=1-4^ delims^=^=^" %%A in (%input%) do (
    set "test=%%A"
    set "desc=%%B"
    set "agent=%%D"
    setlocal enableDelayedExpansion
    for %%N in (!cnt!) do (
      if "!test:<folder description=!" neq "!test!" (
        set /a cnt+=1
        for %%M in (!cnt!) do for /f "delims=" %%E in ("!folder%%N!") do (
          endlocal
          set "folder%%M=%%~E%TAB%%%B"
          set "cnt=%%M"
        )
      ) else if "!test:</folder>=!" neq "!test!" (
        endlocal
        set /a cnt-=1
      ) else if "!test:<useragent description=!" neq "!test!" (
        echo !desc!%TAB%!agent!!folder%%N!
        endlocal
      ) else endlocal
    )
  )
)
dbenham
  • 127,446
  • 28
  • 251
  • 390
  • Thanks for your code. It is not possible to use string substitution with delayed variables when looking for "!"? Yahoo! contains such character. But probably we could replace the exclamation marks with different character before the parsing starts. Also I tried some code, as you can see I updated the code in question. I have problem there because I temp.txt in which I saved the parsed values, has every value in separated line. Is it possible to read every line from the temp.txt to variables %%A %%B %%C etc.? – John Boe Jun 13 '12 at 16:18
  • What is it? `set "folder0=" >%output% ( for` ? I see it first time so don't understand what does it do. – John Boe Jun 13 '12 at 16:47
  • @user1141649 - With the data you provided, you only need to worry about `!` if you care about the useragent string. I assumed you only needed the browser description. It is not possible to use FOR to read multiple lines into %%A %%B etc. as you would like. But you should not require the temporary file in the first place. – dbenham Jun 13 '12 at 17:06
  • 1
    @user1141649 - I need to guarantee the initial value of folder0 in order for the algorithm to work. The parentheses with redirection around the big FOR statement are used to capture the results in a file. I assumed that is what you would ultimately want. Doing the redirection once outside the loop is much more efficient than using append mode within the loop. – dbenham Jun 13 '12 at 17:10
  • @user1141649 - I've added a final solution that preserves `!` values. – dbenham Jun 13 '12 at 17:20
  • thank you very much for your efforts. If I use file useragentswitcher.xml Your script #1 results for empty file and script #2 results in lines like: `` Browsers - Windows!folder!cnt!! ... probably because the file has the Yahoo! strings. Your script #3 works perfectly but I would like to see there in the first column OS version - I mean the xml column #2 starting with `` ... the column also identifies if it is mobile device or bot. So could you please add this column? – John Boe Jun 13 '12 at 18:24
  • @user1141649 - Script 1 has been tested, and it certainly does not create an empty file. There was a bug dealing with detecting folder end, but it has been fixed, and it works with your data. Script 2 works as well, as does Script 3. I don't understand your new requirement, but either way I think I am going to let you figure out the rest. – dbenham Jun 13 '12 at 20:09
  • @dbenham it is not new requirement. See, it is point 1 in my question: folder description from example data: `` . So there should be some basic information, that descripes what is in the folder... – John Boe Jun 14 '12 at 08:10
  • In xml file, there are these items, that should be in the 1st column: Browsers - Windows;Browsers - Mac;Browsers - Linux;Browsers - Unix;Mobile Devices;Spiders – Search;UA List :: About – John Boe Jun 14 '12 at 08:16
  • @user1141649 - You also stated in your original question that you did not know how to deal with the varying number of folders. That is why I changed the order of the columns. The information you want in column 1 I placed in column 2 or 3, depending on which version of the code you use. I also figured out why the 1st code doesn't work for you - when I initially downloaded the XML I used IE, and IE introduced leading spaces and sometimes a dash on each line. So my code looks for those extra characters. When I download with Chrome the leading characters are not there, so of course code 1 fails. – dbenham Jun 14 '12 at 09:36
  • @dbenham Aaaah. Sorry for that. I didn't see what is in the last column. So this is OK. It can be the last column. I use FF. So in my output when I press ctrl+u I see that the layout it ordered into columns like there were tabs. In this case the code is finished. Once more thank! – John Boe Jun 14 '12 at 10:57
  • @dbenham, I've made simple code, to move the columns that you added on end of line to the beginning. It creates new file but it is better organized now: http://codepaste.net/4jjwc7 – John Boe Jun 14 '12 at 12:48
2

Check the xpath.bat - script that can get values from xml by given xpath expression:

call xpath.bat "useragentswitcher.xml" "//folder/@description"
npocmaka
  • 55,367
  • 18
  • 148
  • 187
1

Answer to your comment How should I use double quotes as delimiter?

Simply use the form

FOR /F tokens^=1^,2^ delims^=^" %%B IN ("%%A") DO

How this works?
Normally you can't use a quote character as a delim character.
This is the only known workaround, the important thing is that the normal quotes arround the FOR/F options are missing.
But it's neccessary that the options are parsed as only one token, therefore you need to escape all standard batch-parser delimiters (space tab =,;).
The quote isn't a batch delimiter, but it need to be escaped too, to avoid that the rest of the line is quoted, then the parser would fail.
But you could change the ^" with "" as the second quote will be ignored.

FOR /F tokens^=1^,2^ delims^="" %%B IN ("%%A") DO ...
jeb
  • 78,592
  • 17
  • 171
  • 225
  • Can you explain how does it work? It looks like chaos of characters to me. Why the tokens are enclosed by ^^ and the delims arn't enclosed? Really confused. – John Boe Jun 13 '12 at 16:22