1

My wife is going nuts trying to get a specific item from TJ Maxx that keeps going in and out of stock. I'm trying to write a simple script that just checks for her, using curl and PHP. Here's the code:

$curl_connection = curl_init();
$url = "https://tjmaxx.tjx.com/store/index.jsp";
curl_setopt($curl_connection, CURLOPT_URL, $url);
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt ($curl_connection, CURLOPT_COOKIEJAR, 'cookie.txt');//cookiejar to dump cookie infos.
curl_setopt ($curl_connection, CURLOPT_COOKIEFILE, 'cookie.txt');//cookie file for further reference from the site
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl_connection, CURLOPT_HEADER, true);
curl_setopt($curl_connection, CURLOPT_REFERER, "https://www.google.com");
$result = curl_exec($curl_connection);
echo $result;

This isn't working, it just sits there until eventually timing out. I am able to successfully pull pages like google or cnn by changing the url. Any idea why TJ Maxx's website would be giving me this trouble?

Morteo
  • 11
  • 1
  • because that page gives me HTTP ERROR : 403 !!! – Alaa Kaddour Oct 25 '20 at 01:25
  • Not sure why -- it loads in the browser just fine? – Morteo Oct 25 '20 at 01:28
  • When called via cURL as you have above, after timming out, if I run `$error = curl_error($curl_connection);` I get `OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104` – Wesley Smith Oct 25 '20 at 02:12
  • Based on the error and the similar issue/ comments on [this question](https://stackoverflow.com/questions/53810155/strange-curl-issue-with-a-particular-website-ssl-certificate) Id suspect that the site sits behind a CDN or similar system that is actively blocking access of this kind – Wesley Smith Oct 25 '20 at 02:22

1 Answers1

0

As Wesley Smith mentioned in the comments above, the issue is the site is using a CDN that's blocking these kinds of scraping attempts. The headers mentioned in the other question seem to work for the moment:

"Connection: keep-alive"
"Accept-Encoding: identity"
"Accept-Language: en-US"
Morteo
  • 11
  • 1