1

I need to take input of a list of links that go to pages of similar format, with the difference of content and one tag.

EDIT

input.txt
/category/apples-and-oranges.html
/category/pineapples.html
/category/asparagus.html
/category/brussel-sprouts.html
/category/passion-fruit.html

Assume that the pages involving fruit have <h1>Fruit!</h1> while the non-fruit pages don't, but they're under one category. The program would check those extensions to http://www.mysite.com and then create a new list:

output.txt
/category/apples-and-oranges.html
/category/pineapples.html
/category/passion-fruit.html

Here's what I've got so far:

for /f %%A in (input.txt) DO (
    for "tokens=1,2 delims=:" %%b in ('FINDSTR [/R] [/I] [/S] [/C:"<H1>.*Fruit!.*</H1>"] [[http://]www.mysite.com/%%A[*.html]]') DO (
    echo ^<%%A> > <output.txt>
)

)

Rob W
  • 341,306
  • 83
  • 791
  • 678
James Roseman
  • 1,614
  • 4
  • 18
  • 24
  • 1
    Can you show us an actual example of the input and the actual result you want to achieve? – PA. Oct 20 '11 at 15:27
  • what have you tried so far and what problems you had? To get you started, use this hint... take a look at `FOR` and `FINDSTR` commands and try this `FOR /f "tokens=1,2 delims=:" %%a in ('FINDSTR /R /I /S "

    .*

    " *.htm') do ECHO ^%%b^`
    – PA. Oct 20 '11 at 15:44
  • Thank you @PA, but I'm still a little confused by what you mean. I'll try it out and get back to you if I have any more questions. – James Roseman Oct 20 '11 at 19:29
  • I edited it to show where I'm at now... I'm still not getting it to run though... – James Roseman Oct 20 '11 at 20:09
  • Are you missing a `/f` on your second `for` ? – David R Tribble Oct 20 '11 at 21:06
  • @PA. and James [here](http://stackoverflow.com/a/15419314/2167103) is a fantastic piece of code reading html websites in batchfiles without any further external apps or installs - just wanted to mention this technic. – peet Nov 25 '13 at 22:39

1 Answers1

3

There are several problems in your approach. First of all FINDSTR cannot find in remote URLs. So you need to download them.

Begin with the following code, that uses CURL for downloading, to get you started.

@echo off
FOR /F %%A in (input.txt) DO (
  curl --output temp.html http:www.mysite.com/%%A 
  FOR /F "tokens=1,2 delims=:" %%B in ('FINDSTR /I /R "<H1>.*Fruit.*</H1>" temp.html') DO (
    ECHO %%A
  )
)

Edit:

cURL is not a Windows command, it's an external utility. http://en.wikipedia.org/wiki/CURL . You'll need to install it. There is another well know tool for web download, GNU Wget http://en.wikipedia.org/wiki/Wget . For more options, see this question on Superuser.com https://superuser.com/questions/299754/wget-curl-alternative-native-to-windows

Community
  • 1
  • 1
PA.
  • 28,486
  • 9
  • 71
  • 95