I'm trying to read a list of all ZIP files on a web page and store them in a text file to download later. I can't use any 3rd party tools since this also needs to run on an ARM system as well as Windows 7, so built in commands only. I'm using batch script since it's basically universal in Windows.
I've started by getting the HTML of the website, which I got help with here: How can I find the source code for a website using only cmd?
That gives me the RAW HTML, which I then filter with FINDSTR
FINDSTR /I /C:.ZIP %~DP0FULLHTML.TXT>%~DP0ZIPLINES.TXT
The next step was to parse that file for the actual filenames, but I'm having difficulty because the web page uses a table to list the files, and that results in several lines that are over 19k characters in length. When I try to parse it with a FOR loop, it simply ignores these lines. I cannot figure out how to get this line shorter or split into shorter lines by some delimiter, I've even tried making the below PS1 file, but I know basically nothing about PowerShell scripting and can't seem to get it to work.
[CmdletBinding()]
Param(
[Parameter(Mandatory=$True,Position=1)]
[string]$file,
[Parameter(Mandatory=$True,Position=2)]
[string]$newfile
)
$contents = Get-Content $file
foreach ($line in $contents)
{
$splititems = $line.split("/")
foreach ($line in $splititems)
{
$line | Out-File $newfile
}
}
I then try running from in the batch file:
Powershell -ExecutionPolicy Bypass -File "%~DP0SPLIT.PS1" "%~DP0ZIPLINES.TXT" "%~DP0SPLITLINES.TXT"
This gives me an error saying I'm missing a } at the end.
I know that after searching on this site a bit that CMD has a variable limit of 8196 characters, which those lines exceed, hence the failure... And I'm sure I'm just completely messing up the PS code.
After I can get these big lines split into smaller ones, I have some messy code already that works to get the file names into a single TXT file. Don't know if there's one easy step in PS to just grab all the .ZIP filenames and shove them in a file.