1

Being completely clueless and using powershell for 4h so far, I've managed to get some data I needed for my phd following the instructions from here.

I changed the code to

$web = Invoke-WebRequest http://link.springer.com/article/10.1007/s12111-017-9355-7
(($web.tostring() -split "[`r`n]" | select-string "Keywords" | Select -First 1) -split ":")[1].Trim()

in order to get the keywords from the selected article.

It all worked, more or less, fine on Springer webpages, but when I try to use it on Sage, it returns information I don't need, since (I'm just guessing) the word "Keyword" appears in the "search box".

This is the code I've used for sage:

 $web = Invoke-WebRequest http://journals.sagepub.com/doi/full/10.1177/0263276414536746
(($web.tostring() -split "[`r`n]" | select-string "Keywords" | Select -First 1) -split ":")[1].Trim()

I've tried googling how to make powershell search for the second match but couldn't find anything i could understand. After trying to (cluelessly) play with the code, substituting "Select - first 1)" with select - last / Select - all, and getting no results, I just have to ask:

Is there any easy way to find the second/last match of "Keyword" on a specific page?

Any tips, directions, or even links will be helpful.

Thank you in advance

EDIT: Could the reason I'm not getting wanted results back be because the "keywords" I need are hyperlinked, while text following "keywords" in the search bar is not?

mondieu
  • 13
  • 4
  • 1
    You can get the last item in a collection by setting the position of the collection to -1. Ex: `$a = @(1,2,3,4,5); $a[-1]` will return 5. If you wanted to get the second item in a collection you would use the [1] position as PowerShell is zero-based – trebleCode Mar 13 '18 at 13:34
  • I've tried changing the split to -split ":")[-1], but the result keep coming out the same. Tried playing around with the [-1] on other places too, but I keep either getting the same result or getting errors. I think you vastly overestimated my knowledge, but thank you anyway – mondieu Mar 13 '18 at 14:34

1 Answers1

0

I am currnently at work and am unable to successfully run the first script you posted so I am gunna try and guess what output you are looking for. I think you are just trying to get a list of each of the keywords on a webpage and if so this script should provide you what you want. I dug down into the HTML of the website and noticed that each of the keyworkds hyperlinks has the class "Attributes" so then used powershell to just select links with those attributes and pulled the text of the link. Hope its what your looking for.

$web = Invoke-WebRequest http://journals.sagepub.com/doi/full/10.1177/0263276414536746
$Keywords = $web.links |Where class -match Attributes
Write-Host $Keywords.outertext

Here is the link I stumbled apon that got me to the solution to your problem.

Use GetElementsByClassName in a script

Nick W.
  • 1,536
  • 3
  • 24
  • 40
  • i love you! I was just googling how to get back links as text but god knows how much longer it would've taken me. Thank you very much" – mondieu Mar 13 '18 at 15:26
  • Haha, glad to help! Been in your shoes more than once, generally get stuck looking with the wrong set of keywords and post on here and someone like o just do this and wala. – Nick W. Mar 13 '18 at 15:34
  • :) And one more thing. This pesky security warning (accept cookies) keeps popping out. I've tried adding (-UseBasicParsing) to the code to no avail. Set my privacy to minimum on Internet options, chrome and everything else, but it's still not doing the trick. For the time being, I've downloaded an Autoclicker and set it to click "Yes" every 5 seconds but, if you've got any better ideas, let me know – mondieu Mar 13 '18 at 15:54
  • Try this. https://superuser.com/questions/1275923/using-invoke-webrequest-in-powershell-with-cookies – Nick W. Mar 13 '18 at 17:30
  • Nvm, I've just finished parsing everything with autoclicker enabled and read a book meanwhile :D ty anyway – mondieu Mar 13 '18 at 17:36