I have created a script to crawl the IMDB website. My script take a list of IMDB urls, run and extract the data like movie title, release year, plot summary and export it to a text file in CSV. I wrote the script as below.
$listToCrawl = "imdb_link_list.txt"
$pathOfFile = "K:\MY DOCUMENTS\POWERSHELL\IMDB FILE\"
$fileName = "plot_summary.txt"
New-Item ($pathOfFile + $fileName) -ItemType File
Set-Content ($pathOfFile + $fileName) '"Title","Year","URL","Plot Summary"'
Get-Content ($pathOfFile + $listToCrawl) | ForEach-Object {
$url = $_
$Result = Invoke-WebRequest -Uri $url
$movieTitleSelector = "#title-overview-widget > div.vital > div.title_block > div > div.titleBar > div.title_wrapper > h1"
$movieTitleNode = $Result.ParsedHtml.querySelector( $movieTitleSelector)
$movieTitle = $movieTitleNode.innerText
$movieYearSelector = "#titleYear"
$movieYearNode = $Result.ParsedHtml.querySelector($movieYearSelector)
$movieYear = $movieYearNode.innerText
$plotSummarySelector = "#titleStoryLine > div:nth-child(3) > p > span"
$plotSummaryNode = $Result.ParsedHtml.querySelector($plotSummarySelector)
$plotSummary = $plotSummary.innerText
$movieDataEntry = '"' + $movieTitle + '","' + $movieYear + '","' + $url + '","' + $plotSummary + '"'
Add-Content ($pathOfFile + $fileName) $movieDataEntry
}
The list of urls to extract from is saved in the "K:\MY DOCUMENTS\POWERSHELL\IMDB FILE\imdb_link_list.txt" file and the content is as below.
https://www.imdb.com/title/tt0472033/
https://www.imdb.com/title/tt0478087/
https://www.imdb.com/title/tt0285331/
https://www.imdb.com/title/tt0453562/
https://www.imdb.com/title/tt0120577/
https://www.imdb.com/title/tt0416449/
I just import and run the script. It does not run as expected. The error is threw.
Invalid argument.
At K:\MY DOCUMENTS\POWERSHELL\IMDB_Plot_Summar_ Extract.ps1:20 char:1
+ $plotSummaryNode = $Result.ParsedHtml.querySelector($plotSummarySelec ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], ArgumentException
+ FullyQualifiedErrorId : System.ArgumentException
I think the problem is due to the CSS selector I use to select the data but I don't know what's wrong. I think I have followed the CSS selector rule.
$plotSummarySelector = "#titleStoryLine > div:nth-child(3) > p > span"
Does anyone know what's wrong with the thing.