1

I'm trying to filter a PDF file with the pdf2text code (https://pastebin.com/dvwySU1a) and both preg_replace and preg_match_all are doing nothing, like there was 0 results for that regexp.

The result I'm expecting is this: https://regex101.com/r/uMTrtd/3 but I don't know why I'm not getting it. I've tried changing the PCRE limits with no result and I don't know what else to do.

If I run the preg with the actual string instead of $a->output() it works, but the var_dump($text) returns a string.

<?php
include('pdf2text.php');
$a = new PDF2Text();
$a->setFilename('http://www.congreso.es/public_oficiales/L12/CONG/DS/PL/DSCD-12-PL-127.PDF'); 
$a->decodePDF();
$text = preg_replace('/(cve: .+? Pág\. [0-9]{1,2} )/u','', $a->output());
var_dump($text);
echo '</br>';
echo '</br>';
echo '</br>';
$re = '/(La señora|El señor) (.+?):(.+?\. (?=(La señora|El señor) (.+?):|Eran las .+?\.))/u';
preg_match_all($re, $text, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
?>

PS: I'm using PHP7

Nstemp
  • 31
  • 3
  • Which part is not working? [It seemed to work for me](https://repl.it/repls/OurKnobbyHandwritingrecognition). – l'L'l Jun 11 '18 at 20:06
  • None, I only get an empty array. I get 0 errors and I know it should be working, but it isn't. Can you try with the pdf2text? – Nstemp Jun 11 '18 at 21:15

0 Answers0