1

I use the following script :

https://www.html2pdf.fr/en/home

This script convert my php file to pdf file.

Example the url : mywebsite.com/pdf/url.php?id=8 will generate a PDF file. Another example : https://github.com/spipu/html2pdf/blob/master/examples/example01.php

I don't want the google robot to index these pages.

I added the code below in my htaccess file but it doesn't prevent google from crawling the page because it's in PHP : #Block indexing of Word and PDF files <files ~ ".(doc|docx|pdf)$"> Header Set X-Robots-Tag "noindex, nofollow

I can't block it how do I do it?

bobbyscoto
  • 11
  • 1
  • 2
    In the links to the PDF, put a rel no follow: https://stackoverflow.com/a/2509022/231316 – Chris Haas Mar 08 '21 at 12:25
  • Thank you, you are right. However the pages are already indexed by the Google robot. How to unindex them? – bobbyscoto Mar 08 '21 at 13:58
  • For the existing content, log into Google Search Console and [request to have the URLs removed](https://support.google.com/webmasters/answer/6332384). Also, update your htaccess to include rules for the URL of your dynamic PDF generator. htaccess doesn't care if it is a static file or dynamic, it is just URL patterns. – Chris Haas Mar 08 '21 at 14:10

1 Answers1

0

You can add X-Robots-Tag HTTP Response header in the script that generates your PDF file.

Example: header("X-Robots-Tag: noindex, nofollow", true);

Reference.

shasi kanth
  • 6,987
  • 24
  • 106
  • 158