I'm trying to filter a PDF file with the pdf2text code (https://pastebin.com/dvwySU1a) and both preg_replace
and preg_match_all
are doing nothing, like there was 0 results for that regexp.
The result I'm expecting is this: https://regex101.com/r/uMTrtd/3 but I don't know why I'm not getting it. I've tried changing the PCRE limits with no result and I don't know what else to do.
If I run the preg
with the actual string instead of $a->output()
it works, but the var_dump($text)
returns a string.
<?php
include('pdf2text.php');
$a = new PDF2Text();
$a->setFilename('http://www.congreso.es/public_oficiales/L12/CONG/DS/PL/DSCD-12-PL-127.PDF');
$a->decodePDF();
$text = preg_replace('/(cve: .+? Pág\. [0-9]{1,2} )/u','', $a->output());
var_dump($text);
echo '</br>';
echo '</br>';
echo '</br>';
$re = '/(La señora|El señor) (.+?):(.+?\. (?=(La señora|El señor) (.+?):|Eran las .+?\.))/u';
preg_match_all($re, $text, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
?>
PS: I'm using PHP7