0

I have read this question and this question, and probably more questions, and want do to exactly what they do there but I just get empty results when I try.

I want to extract the profile link to all the followers here https://www.facebook.com/zuck/followers

A very crude Xpath that points to the name, which is a clickable link, of the follower is //*[@id="mount_0_0_MW"]/div/div[1]/div/div[3]/div/div/div/div[1]/div[1]/div/div/div[4]/div/div/div/div/div/div/div/div/div[3]/div/div[2]/div[1]/a

The a-tag I point at typically looks something like this

<a class="x1i10hfl xjbqb8w x6umtig x1b1mbwd xaqea5y xav7gou x9f619 x1ypdohk xt0psk2 xe8uvvx xdj266r x11i5rnm xat24cr x1mh8g0r xexx8yu x4uap5 x18d9i69 xkhd6sd x16tdsg8 x1hl2dhg xggy1nq x1a2a7pz x1heor9g xt0b8zv" href="https://www.facebook.com/profile.php?id=100072622654958" role="link" tabindex="0">

To extract the value of the href I, per the linked question, add /@href to the end of the xpath above but when I evaluate this expression using $x in the browser console (in Safari) I get an empty result:

enter image description here

How do I rewrite my xpath so that I get an array with the values in the href-attribute when I evaluate it?

d-b
  • 695
  • 3
  • 14
  • 43
  • The result of `$x` is an array so you can as well use array functions like `map` e.g. `$x('//*[@id="mount_0_0_OV"]/div/div[1]/div/div[3]/div/div/div/div[1]/div[1]/div/div/div[4]/div/div/div/div/div/div/div/div/div[3]/div[2]/div[2]/div[1]/a').map(link => link.href)` to process it further. (Note that the used XPath is slightly different from yours as your didn't select anything for me so I let the browser suggest me the one used in my example). – Martin Honnen Apr 01 '23 at 10:12
  • @MartinHonnen I am not sure I follow you. What is your expression supposed to do? – d-b Apr 01 '23 at 11:00
  • "Note that the used XPath is slightly different from yours as your didn't select anything for me" - yes, that is my problem and the reason I asked this question. When I execute your expression in Safari's console I get `[] (0) = $10`- which is kind of empty too. Do you get another result than I do? Thank you. – d-b Apr 01 '23 at 11:02
  • I understood from your description that for your e.g. `$x('//*[@id="mount_0_0_MW"]/div/div[1]/div/div[3]/div/div/div/div[1]/div[1]/div/div/div[4]/div/div/div/div/div/div/div/div/div[3]/div/div[2]/div[1]/a')` selected the wanted `a` elements in your browser but somehow the attempt to use the path `$x('//*[@id="mount_0_0_MW"]/div/div[1]/div/div[3]/div/div/div/div[1]/div[1]/div/div/div[4]/div/div/div/div/div/div/div/div/div[3]/div/div[2]/div[1]/a'/@href)` somehow with the XPath API/engine of your browser failed. – Martin Honnen Apr 01 '23 at 14:38
  • Therefore I suggested an alternative to get at the `href` attribute/property value, namely `$x('//*[@id="mount_0_0_MW"]/div/div[1]/div/div[3]/div/div/div/div[1]/div[1]/div/div/div[4]/div/div/div/div/div/div/div/div/div[3]/div/div[2]/div[1]/a').map(link => link.href)`. Only for testing in Chrome on Windows I needed to use a different path expression, namely `//*[@id="mount_0_0_OV"]/div/div[1]/div/div[3]/div/div/div/div[1]/div[1]/div/div/div[4]/div/div/div/div/div/div/div/div/div[3]/div[2]/div[2]/div[1]/a`. – Martin Honnen Apr 01 '23 at 14:41
  • @MartinHonnen The xpath ending in `@href` fails in the respect that it doesn't return the `@href` attribute, instead it just returns an empty string (or an array of empty strings). – d-b Apr 01 '23 at 16:16
  • @MartinHonnen Your example ending in `.map(link => link.href)` did work. Thank you. Do you think you can explain why/how it works? If you post it as an answer I will accept that solution. – d-b Apr 01 '23 at 16:18

2 Answers2

3

Try an XPath like this:

//a[starts-with(@href, "https://www.facebook.com/profile.php?")]/@href

In Chrome dev tools:

$x('//a[starts-with(@href, "https://www.facebook.com/profile.php?")]/@href')

Result:

Array(24) [ 
href="https://www.facebook.com/profile.php?id=100025227933647", 
href="https://www.facebook.com/profile.php?id=100025227933647", 
href="https://www.facebook.com/profile.php?id=100004202773657", 
href="https://www.facebook.com/profile.php?id=100004202773657", 
href="https://www.facebook.com/profile.php?id=100089136296666", 
href="https://www.facebook.com/profile.php?id=100089136296666", 
href="https://www.facebook.com/profile.php?id=100088772316924", 
href="https://www.facebook.com/profile.php?id=100088772316924", 
href="https://www.facebook.com/profile.php?id=100090228025189", 
href="https://www.facebook.com/profile.php?id=100090228025189", 
… ]

... or maybe if you want to start by restricting the search to within a particular part of the page as in your example XPath above:

//*[@id="mount_0_0_MW"]//a[starts-with(@href, "https://www.facebook.com/profile.php?")]/@href

This is a search for a elements whose links point to Facebook profile pages.

I have seen many questions on this site where people have trouble with an XPath that's been suggested by their browser, and their expression looks something like this:

/div[2]/div[2]/div[1]/div[3]/div[1]/a

XPath expressions like these are easy for a browser to generate, as they simply ascend from the selected element up through the element hierarchy, counting the preceding siblings at each level. But they are usually not very reliable because they depend on the HTML page having a fixed structure that doesn't change. If the page added an extra div element at some crucial part of the page, then the XPath could easily end up pointing to somewhere different to where it pointed before.

In my opinion people would often be better off writing an XPath themselves that expresses what it is that they are actually looking for. In your case, you're not really looking for a elements which appear at a particular level in the div hierarchy; you're actually looking for links to profiles. An XPath that focuses on the semantics of your search is likely to be more reliable and robust in the face of change.

Conal Tuohy
  • 2,561
  • 1
  • 8
  • 15
  • I agree with your advice on writing your own xpaths, and I usually do that when doing something more permanent. In this case I just used the browser xpath because I needed something for the question. Anyway, the problem remains even when I try your xpaths, I don't get the href-attribute. Did you try your suggestions? – d-b Apr 01 '23 at 13:32
  • I executed my first suggestion in Chrome dev tools using the `$x` function and got a list of `a` elements – Conal Tuohy Apr 01 '23 at 13:37
  • It wasn't clear to me what kind of client app you are writing; maybe Selenium, maybe a browser userscript (JS)? So I just wrote an XPath to retrieve the `a` elements. Maybe you need to show the rest of your code? – Conal Tuohy Apr 01 '23 at 13:41
  • I have edited my answer to show the results I got – Conal Tuohy Apr 01 '23 at 14:04
  • When I execute your suggestion this is the result: https://imgur.com/a/tZ9HO7y – d-b Apr 01 '23 at 16:21
  • I am writing an AppleScript to control Safari using Safari's `do javascript`-method. – d-b Apr 01 '23 at 16:22
  • This is odd, when I try `$x('//a[starts-with(@href, "https://www.facebook.com/profile.php?")]/@href')` I sometimes get a result and sometimes not. Firefox worked https://imgur.com/a/vjOmPnw but Chrome and Safari returns empty results. – d-b Apr 01 '23 at 19:37
3

Based on your description (and without access to a Mac/Safari to test) it looks as if somehow the XPath evaluation for an attribute node @href fails, as an alternative I think you can rely on XPath only to select the a element(s), then have JavaScript array methods like map (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map) and normal browser DOM properties like .href take over; that would mean you use e.g.

$x('//*[@id="mount_0_0_MW"]/div/div[1]/div/div[3]/div/div/div/div[1]/div[1]/div/div/div[4]/div/div/div/div/div/div/div/div/div[3]/div/div[2]/div[1]/a').map(link => link.href)

where the $x(..) call returns an array of a element nodes and the subsequent map call maps that array of a element nodes into an array of string values based on the href property of the a elements.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • You can never be happy...I try to use this with `document.evaluate` but however I do to use the suffix you suggest, `.map(link => link.href)` I can't make it work. The "root" xpath works fine but as soon as I add .map(link => link.href) I get different kinds of error message, mostly "TypeError Map is not a function". Any ideas how to use .map(link => link.href) with document.evaluate? Thanks. – d-b Apr 01 '23 at 22:23
  • The `map` method works on arrays or array like objects; `document.evaluate` does not return anything like that. – Martin Honnen Apr 01 '23 at 23:12