2

I have a text file (myurls.txt) whose contents are a list of URLs as follow:

Slides_1:   http://linux.koolsolutions.com/svn/ProjectA/tags/REL-1.0
Exercise_1: http://linux.koolsolutions.com/svn/Linux/tags/REL-1.0

Slides_2:   http://linux.koolsolutions.com/svn/oldproject/ProjectB/tags/REL-2.0
Exercise_2: http://linux.koolsolutions.com/svn/ProjectB/tags/REL-1.0

Exercise_3: http://linux.koolsolutions.com/svn/BlueBook/ProjectA/tags/REL-1.0

Now I want to parse this text file in a for loop such that after each iteration (for e.g. take the first url from the above file) I have the following information into different variables:

%i% = REL-1.0
%j% = http://linux.koolsolutions.com/svn/ProjectA
%k% = http://linux.koolsolutions.com/svn/ProjectA/tags/REL-1.0

After some experiment I have the following code but it only works (kind of) if the URLs have same number of slashes:

@echo off
set FILE=myurls.txt
FOR /F "tokens=2-9 delims=/ " %%i in (%FILE%) do (
@REM <do something with variables i, j and k.>
)

Obviously, I need to make it more flexible so that it can handle arbitrary url length. I am fine with other solutions like for e.g. using Windows Script Host/VBscript as long as it can run with a default Windows XP/7 installation. In other words, I know I can use awk, grep, sed, python, etc. for Windows and get the job done but I don't want the users to have to install anything besides a standard windows installation.

modest
  • 1,387
  • 3
  • 16
  • 20

3 Answers3

4

I think this might be what you are looking for, though I am not absolutely sure what your rules are for identifying the project.

It uses the FOR ~pnx modifiers to parse parts of a path. Use HELP FOR from the command line for more info. It uses \..\.. to get to the grand-parent "directory", and \ is prepended to make the "path" absolute.

The result converts / and // into \, so variable search and replace is used to restore the proper slash delimiters, and a substring operation is used to strip off the leading slash. Use HELP SET from the command line for more information on search and replace and substring operations.

Delayed expansion is used because it needs to expand a variable that was set within the same block of code.

@echo off
setlocal enableDelayedExpansion
set "file=myurls.txt"
for /f "tokens=1*" %%A in (%file%) do (
  for /f "delims=" %%C in ("\%%B\..\..") do (
    set "project=%%~pnxC"
    set "project=!project:~1!"
    set "project=!project:\=/!"
    set "project=!project:http:/=http://!"
    echo header  = %%A
    echo url     = %%B
    echo project = !project!
    echo release = %%~nxB
    echo(
  )
)

Here are the results for your sample data:

header  = Slides_1:
url     = http://linux.koolsolutions.com/svn/ProjectA/tags/REL-1.0
project = http://linux.koolsolutions.com/svn/ProjectA
release = REL-1.0

header  = Exercise_1:
url     = http://linux.koolsolutions.com/svn/ProjectA/tags/REL-1.0
project = http://linux.koolsolutions.com/svn/ProjectA
release = REL-1.0

header  = Slides_2:
url     = http://linux.koolsolutions.com/svn/oldproject/ProjectB/tags/REL-2.0
project = http://linux.koolsolutions.com/svn/oldproject/ProjectB
release = REL-2.0

header  = Exercise_2:
url     = http://linux.koolsolutions.com/svn/ProjectB/tags/REL-1.0
project = http://linux.koolsolutions.com/svn/ProjectB
release = REL-1.0

header  = Exercise_3:
url     = http://linux.koolsolutions.com/svn/BlueBook/ProjectA/tags/REL-1.0
project = http://linux.koolsolutions.com/svn/BlueBook/ProjectA
release = REL-1.0
dbenham
  • 127,446
  • 28
  • 251
  • 390
  • Thanks for your reply! Actually there is no rule for identify project. In fact the URLs might not even have the name Project in them. For example, a URL can simply be: http://linux.koolsolutions.com/svn/LinuxMaterial/tags/REL-1.0. The only form is that the each URL has header (like Slides_N or Exercise_N) and ends with /tags/REL-X.Y. – modest Sep 08 '12 at 00:09
  • 1
    @modest - Well that is a rule :-) Thankfully it is also compatible with the rule that I implemented. I strip off the last 2 path components to get the project, which would correspond with "/tags/REL-X.Y". It looks to me like your question has been answered. – dbenham Sep 08 '12 at 01:05
1
@echo off

:: First seperate into Label, URI type, and internet path
for /f "tokens=1-3 delims=:" %%x in (URLs.txt) do (
  echo.

  :: Take the Label
  for /f %%a in ("%%x") do set LabelNam=%%a

  :: Assemble Release URL
  set ReleaseURL=http:%%z

  :: Delayed variable expansion is required just for 'z'
  setlocal enabledelayedexpansion

    :: Take Release URL Path
    set z=%%z

    :: Extract the Release
    for /f "tokens=2" %%b in ("!z:/tags/= !") do set Release=%%b

    :: Split the Internet Path at the '/''s and call ':getURL'
    call :getURL %%y !z:/= !

    :: Output the information 
    echo       Label = !LabelNam!
    echo     Release = !Release!
    echo         URL = !URL!
    echo Release URL = !ReleaseURL!
  :: End variable expansion
  endlocal
)
goto :eof


:getURL
  :: Get URL type
  set URL=%1:/
  :: shift all arguments one to the left
  shift

  :URLloop
    :: Assemble URL
    set URL=%URL%/%1
    shift
  :: If we haven't fount 'tags' yet, loop
  if "%1" neq "tags" goto :URLloop

goto :eof
James K
  • 4,005
  • 4
  • 20
  • 28
  • Sorry for taking so long to post, but I got very distracted by an extremely odd error. if you add another colon delimited comment above `:: Extract the Release` I get a `The system cannot find the drive specified.` error. Just a pair of colons will do it. But replace the `::`'s with `REM`, and it acts just fine. I played with it and got different errors. It's just so **BIZZARE**. And I still haven't figured out what's going on. – James K Sep 09 '12 at 01:42
  • 1
    The **BIZZARE** effect comes from primary/secondary label lines inside parenthesis [SO: windows batch file with goto command not working](http://stackoverflow.com/a/4006006/463115) – jeb Sep 10 '12 at 11:50
  • @jeb - Ach, I should of known! I remember `::` breaking my code before by causing the code act like it was outside the parenthesis. Thanks for reminding me. ^_^ – James K Sep 10 '12 at 12:16
  • @jeb - Say, I also had trouble doing this: `for /f "tokens=* delims=/" %%x in ("http://somesite.com/path/whatever") do echo %%x` because it would always just echo the entire URL, and not split it up. But it works if I specify the tokens. Do you know if that's just a bug in `for`? – James K Sep 10 '12 at 12:23
  • It's not a bug, `tokens=*` is interpreted as _ONE_ token with the complete content, you have to declare each token at the `tokens=` option to be able to access them later, but you are allowed to skip some tokens, like `tokens=1,4,7,*` – jeb Sep 10 '12 at 12:56
  • @jeb - I just double checked and your right. I was thinking that I'd successfully used that method to split up a line and send all the tokens to a :lable that just used `%1` and `shift` to read in an unknown number of tokens. Well, it can't help that I haven't gotten any sleep in the past 48 hours. :( I must have done an `...in ("%var:/= %") do ...` equivalent instead. – James K Sep 10 '12 at 13:50
1

OK, my shortest and most understandable, though least commented solution:

@echo off
for /f "tokens=1-3 delims=: " %%x in (URLs.txt) do (
  set LabelNam=%%x
  set ReleaseURL=%%y:%%z
  for /f "tokens=1-31 delims=/" %%a in ("%%y:%%z") do call :getURL %%a %%b %%c %%d %%e %%f %%g %%h %%i %%j %%k %%l %%m %%n %%o %%p %%q %%r %%s %%t %%u %%v
  echo.
  echo       Label = %LabelNam%
  echo     Release = %Release%
  echo         URL = %URL%
  echo Release URL = %ReleaseURL%
)
goto :eof

:getURL
  set URL=%1/
  shift
  :URLloop
    set URL=%URL%/%1
    shift
  if "%1" neq "tags" goto :URLloop
  Set Release=%2
goto :eof
James K
  • 4,005
  • 4
  • 20
  • 28