2

I am working on web scraping for one of our client's site. All working fine. But I am getting one issue that the font is not working. I am getting following error in chrome console:

Access to Font at 'https://www.example.com/fonts/fontawesome-webfont.woff?v=4.2.0' from origin 'http://www.mydomain' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://www.mydomain' is therefore not allowed access.

enter image description here

I have try to put following code in http://www.mydomain .htaccess file but no luck

.htaccess

<IfModule mod_headers.c>
  <FilesMatch "\.(ttf|ttc|otf|eot|woff|font.css|css)$">
    Header set Access-Control-Allow-Origin "*"
    Header set Access-Control-Allow-Headers "Cache-Control, Pragma, Origin, Authorization, Content-Type, X-Requested-With"
    Header set Access-Control-Allow-Methods "GET, PUT, POST"
  </FilesMatch>
</IfModule>

Note: I can not do any change https://www.example.com and in my browswer cache is also disabled.

php code for web scraping:

$cookie = 'cookies.txt';
$timeout = 90;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT,        400); 
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,  $timeout );
curl_setopt($ch, CURLOPT_COOKIEJAR,       $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE,      $cookie);
curl_setopt($ch, CURLOPT_USERAGENT,
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($ch, CURLOPT_FILETIME, true);   
$curl_scraped_page = curl_exec($ch);    
curl_close($ch);
echo $curl_scraped_page;

EDIT

apache headers module is also enabled

enter image description here

B. Desai
  • 16,414
  • 5
  • 26
  • 47
  • Please check once header module is enabled or not using `a2enmod headers`. – Paresh Barad Jun 16 '17 at 10:32
  • 1
    @PareshBarad sorry. I can't get you. Will you please explain more? – B. Desai Jun 16 '17 at 11:05
  • I have checked your code but i have't found any issue with your code so i am giving you one little bit suggestion for enable **apache headers module** , if you are using Linux system or server then you can follow this [answer](https://stackoverflow.com/a/22655232) – Paresh Barad Jun 16 '17 at 11:13
  • header module enabled @PareshBarad. And I am using WAMP on windows – B. Desai Jun 16 '17 at 11:44
  • Some one constantly down votting my all questions !!! without specify any reason – B. Desai Jun 18 '17 at 10:42

2 Answers2

3

To enable accessing the font on the server www.example.com from the website on the server www.mydomain the server www.example.com needs to allow the request from www.mydomain. For that on the server www.example.com in the response to a HTTP request (get) the response must contain (at least) the following header:

Access-Control-Allow-Origin: http://www.mydomain

If you have no control to configure the server www.example.com in such a manner, you need to download the resource as well and place it with the scraped content and change the link to it. See the Q&A reference resource "How do you parse and process HTML/XML in PHP?" for an introduction into HTML processing with PHP. There are also ready-made PHP libraries for scraping that can support you in your task.

hakre
  • 193,403
  • 52
  • 435
  • 836
1

There are many reasons this may not be working for you.

  1. Web server configuration: Your web server is not configured to recognize individual .htaccess. You will have to specify the AllowOverride directive correctly (for Apache) in the right place (Usually apache2.conf).
  2. You are using a software (e.g.) Wordpress which is rewriting your homepage request to a http version.
  3. You are using only https version of the font resource

In the case of the later you can rewrite the script to load the resources based on the request protocol. e.g:

//maxcdn.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css

This will allow the browser to use either http or https based on the request if you have access to the source code of example.com. If you don't, it's far better for you to scrape the https version of example.com than to hack the CORS configuration.

Chibueze Opata
  • 9,856
  • 7
  • 42
  • 65