html2pdf : Prevent the crawl of the google robot

Question

I use the following script :

https://www.html2pdf.fr/en/home

This script convert my php file to pdf file.

Example the url : mywebsite.com/pdf/url.php?id=8 will generate a PDF file. Another example : https://github.com/spipu/html2pdf/blob/master/examples/example01.php

I don't want the google robot to index these pages.

I added the code below in my htaccess file but it doesn't prevent google from crawling the page because it's in PHP : #Block indexing of Word and PDF files <files ~ ".(doc|docx|pdf)$"> Header Set X-Robots-Tag "noindex, nofollow

I can't block it how do I do it?

In the links to the PDF, put a rel no follow: https://stackoverflow.com/a/2509022/231316 — Chris Haas, Mar 08 '21 at 12:25
Thank you, you are right. However the pages are already indexed by the Google robot. How to unindex them? — bobbyscoto, Mar 08 '21 at 13:58
For the existing content, log into Google Search Console and [request to have the URLs removed](https://support.google.com/webmasters/answer/6332384). Also, update your htaccess to include rules for the URL of your dynamic PDF generator. htaccess doesn't care if it is a static file or dynamic, it is just URL patterns. — Chris Haas, Mar 08 '21 at 14:10

score 0 · Answer 1 · answered Mar 11 '22 at 04:20

0

You can add X-Robots-Tag HTTP Response header in the script that generates your PDF file.

Example: header("X-Robots-Tag: noindex, nofollow", true);

Reference.

answered Mar 11 '22 at 04:20

shasi kanth

6,987
24
106
158

html2pdf : Prevent the crawl of the google robot

1 Answers1

Linked