1

I am having a strange problem which seems to be a problem of ipv6 vs ipv4 dns names.

I have a real time scraper which runs on my server which runs on ipv6 network. After scraping, this scraper returns some urls to images on a web page via ajax calls and then the images are shown in the browser on my local machine via the links returned by the scraper. But these urls are not resolved on my local network. My local machine does not run on ipv6 network. Also the web page being scraped hosts the images via CDNs so the scraper would return results / links to images based on which machine / location it runs.

As an example:

server scrapes http://www.flipkart.com/it-s-not-bike-0224060872/p/itmczyx5zzktubhy?pid=9780224060875 and returns the following link:

http://img-ipv6.flixcart.com/image/book/8/7/5/it-s-not-about-the-bike-my-journey-back-to-life-275x275-imadarucmnec3hds.jpeg

When I access this image from my local machine which is in another geography then my server (scraper), then it is unable to resolve the link above. Using curl on my local machine it reports:

curl: (7) Failed to connect to 2001:df0:23e:9002::17: Network is unreachable

while doing the same on server downloads the image perfectly.

I'm not sure why should the link to image work in one network but not in another?

Divick
  • 1,213
  • 1
  • 20
  • 44

2 Answers2

2

Obviously img-ipv6.flixcart.com is supposed to resolve only to an IPv6 address, not to a v4 one: it just has the IPv6 address 2001:df0:23e:9002::17 which you clearly cannot access from your PC.

Over a IPv4 connection, the image has the address http://img7.flixcart.com/image/book/8/7/5/it-s-not-about-the-bike-my-journey-back-to-life-275x275-imadarucmnec3hds.jpeg.

glglgl
  • 89,107
  • 13
  • 149
  • 217
  • How could I know and convert the ipv6 address to that of an ipv4 address? How can a machine know / parse if a link is to that of an ipv4 address or ipv6 address? – Divick Jun 27 '12 at 15:00
  • Also please explain why img-ipv6.flixcart.com is supposed to resolve to an ipv6 address? – Divick Jun 27 '12 at 15:02
  • 1. Conversion is not possible. The server has detected that you have requested the document via IPv6 and provided the respective link. The link can be distinguished by DNS resolving the host name, in this case `img-ipv6.flixcart.com` resp. `img7.flixcart.com`. – glglgl Jun 27 '12 at 15:04
  • 1
    2. It is supposed to do so because it contains the string `ipv6` (that is usually a good indicator for this) and it indeed does so because if you resolve the name with `host img-ipv6.flixcart.com`, you get `img-ipv6.flixcart.com has IPv6 address 2001:df0:23e:9002::17`, while with `host img7.flixcart.com` you get `img7.flixcart.com is an alias for img-new1.flixcart.com.` and `img-new1.flixcart.com has address 103.4.253.22`. – glglgl Jun 27 '12 at 15:06
  • There are multi-protocol hosts as well. If you do `host www.google.com`, you might get something like `www.google.com is an alias for www.l.google.com.` and some lines like `www.l.google.com has address 173.194.35.146` and finally `www.l.google.com has IPv6 address 2a00:1450:4016:800::1011`. – glglgl Jun 27 '12 at 15:08
  • Is there a way to request the document via IPv4 by default even if the machine on which scraper runs has Ipv6 enabled? Since my scraper is written in python, I guess it would be related to the urllib's urlopen method call. Otherwise is there a way to disable IPv6 on the server and how can that be achieved? Another related question would be that is there any disadvantage in terms of speed with using ipv4 over ipv6? – Divick Jun 27 '12 at 15:46
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/13119/discussion-between-divkis01-and-glglgl) – Divick Jun 27 '12 at 16:10
  • Maybe ask this as a separate question, as it has moved away from the original topic. If you want, you can leave a pointer (link) here to the new question. But as I don't know much about scraper, I won't be able to help you very good. – glglgl Jun 27 '12 at 16:39
  • The related question is posted here: http://stackoverflow.com/questions/11231244/how-to-do-urlopen-over-ipv4-by-default – Divick Jun 27 '12 at 16:48
0

The server name img-ipv6.flixcart.com only has an IPv6 address. It does not have an IPv4 address.

You will only be able to access that hostname on servers that have IPv6 connectivity.

Bill Lynch
  • 80,138
  • 16
  • 128
  • 173