0

In order to find some features that I need, I want to establish a connection to a website using open(mycon, "r"). To do this, I used the code below which is provided by @Dunois:

myx <- httr::HEAD(example)$url
mycon <- url(myx)
open(mycon, "r")

where example is a link to a website. This code works perfectly for all websites; however, in some unique cases like "https://www.pixilink.com/140079#mode=tour" or "https://www.pixilink.com/141152#mode=0" it doesn't work. These websites exist and I check them in my browser and I am not sure why the connection cannot be established. The error message I get is:

Error in open.connection(mycon, "r") : cannot open the connection In addition: Warning message: In open.connection(mycon, "r") : cannot open URL 'https://www.pixilink.com/140079#mode=tour': HTTP status was '400 Bad Request'

I appreciate it if you can shed light on this and clarify why I get this error message?

Ross_you
  • 881
  • 5
  • 22
  • I'm not getting that error on Linux using R 3.6.3. On the other hand I'm not sure what I should be doing with that fact. R is not a browser. – IRTFM Nov 03 '20 at 04:10
  • @IRTFM, of course, R is not a browser. The reason I need this is to open the website and then using `readLines`, `httr:HEAD` and other functions to determine if the website includes video or photo, that's why it's important for me to make sure I have the right connection with the website and then do any post-processing that is required – Ross_you Nov 03 '20 at 04:17
  • @IRTFM , and I must say I am surprised that you don't get this error on Linux. kind of confusing... – Ross_you Nov 03 '20 at 04:18
  • 1
    @Roozbeh_you Remove those URL fragments after the hash mark `#`. Then your connection should work. – ekoam Nov 03 '20 at 06:25
  • @ekoam, yes this works, but then I have to search for all these `#` and remove them in my dataset, right? is there any reason that it's not working with `#`? Alos, this just an example, what if that the connection fails to be opened in other cases? My understanding was that `open` will open the connection if it's valid but it seems that there are some cases like this that it can't handle it – Ross_you Nov 03 '20 at 06:54
  • 1
    I do think those fragments make your connection not work. URL fragments are linked to internal sections on the web page specified by the main URL. When you provide a URL with fragments, your browser redirects you to the main webpage automatically. However, R's `open` function doesn't do that redirection, so I think that's why you cannot "open" the web page. See this [post](https://stackoverflow.com/a/30997598/10802499) for detailed explanations about URL fragments. It's not hard to remove those fragments, though. Check the `fragment` function from the `urltools` package. @Roozbeh_you – ekoam Nov 03 '20 at 08:46

0 Answers0