1

Here's the problem: I have a database full of articles marked up in XHTML. Our application uses Prince XML to generate PDFs. An artifact of that is that footnotes are marked up inline, using the following pattern:

<p>Some paragraph text<span class="fnt">This is the text of a footnote</span>.</p>

Prince replaces every span.fnt with a numeric footnote marker, and renders the enclosed text as a footnote at the bottom of the page.

We want to render the same content in ebook formats, and XHTML is a great starting point, but the inline footnotes are terrible. What I want to do is convert the footnotes to endnotes in my ebook build script.

This is what I'm thinking:

  1. Create an empty array called $endnotes to store the endnote text.
  2. Set a variable $endnote_no to zero. This variable will hold the current endnote number, to display inline as an endnote marker, and to be used in linking the endnote marker to the particular endnote.
  3. Use preg_replace or preg_replace_callback to find every instance of <span class="fnt">(.*?)</span>.
  4. Increment $endnote_no for each instance, and replace the inline span with '<sup><a href="#endnote_' . $endnote_no . '">' .$endnote_no . ''`
  5. Push the footnote text to the $endnotes array so that I can use it at the end of the document.
  6. After replacing all the footnotes with numeric endnote references, iterate through the $endnotes array to spit out the endnotes as an ordered list in XHTML.

This process is a bit beyond my PHP comprehension, and I get lost when I try to translate this into code. Here's what I have so far, which I mainly cobbled together based on code examples I found in the PHP documentation:

$endnotes = array();
$endnote_no = 0;
class Endnoter {

  public function replace($subject) {
    $this->endnote_no = 0;
    return preg_replace_callback('`<span class="fnt">(.*?)</span>`', array($this, '_callback'), $subject);
  }

  public function _callback($matches) {
    array_push($endnotes, $1);
    return '<sup><a href="#endnote_' . $this->endnote_no++ . '">' . $this->endnote_no . '</a></sup>';
  }
}

...

$replacer = new Endnoter();
$replacer->replace($body);
echo '<pre>';
print_r($endnotes); // Just checking to see if the $endnotes are there.
echo '</pre>';

Any guidance would be helpful, especially if there is a simpler way to get there.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
John Stephens
  • 781
  • 2
  • 12
  • 19
  • If this was another language I'd say yes, there is a nicer way of doing this. In PHP I believe the answer is no, but I might be mistaken. My idea is that instead of calling a function that does the whole thing, and giving it another function as a parameter, you use a loop and only do one replacement in each loop iteration. In Perl, for instance, you could do this: `my $endnote_no = 0; my @endnotes; while ($subject =~ s!(.*?)!$endnote_no!) { $endnote_no++; push @endnotes, $1; }` – David Knipe Aug 30 '13 at 22:36

2 Answers2

2

First, you're best off not using a regex for HTML manipulation; see here: How do you parse and process HTML/XML in PHP?

However, if you really want to go that route, there are a few things wrong with your code:

  1. return '<sup><a href="#endnote_' . $this->endnote_no++ . '">' . $this->endnote_no . '</a></sup>';
    

    if endnote_no is 1, for example this will produce

    '<sup><a href="#endnote_1">2</a></sup>';
    

    If those values are both supposed to be the same, you want to increment endnote_no first:

    return '<sup><a href="#endnote_' . ++$this->endnote_no . '">' . $this->endnote_no . '</a></sup>';
    

    Note the ++ in front of the call instead of after.

  2. array_push($endnotes, $1);
    

    $1 is not a defined value. You're looking for the array you passed in to the callback, so you want $matches[1]

  3. print_r($endnotes);
    

    $endnotes is not defined outside the class, so you either want a getter function to retrieve $endnotes (usually preferable) or make the variable public in the class. With a getter:

    class Endnotes {
        private $endnotes = array();
        //replace any references to $endnotes in your class with $this->endnotes and add a function:
    
        public function getEndnotes() {
            return $this->endnotes;
        }
    }
    //and then outside
    print_r($replacer->getEndnotes());
    
  4. preg_replace_callback doesn't pass by reference, so you aren't actually modifying the original string. $replacer->replace($body); should be $body = $replacer->replace($body); unless you want to pass body by reference into the replace() function and update its value there.

Community
  • 1
  • 1
ChicagoRedSox
  • 638
  • 6
  • 18
  • Thank you @ChicagoRedSox! I think I followed your reasoning and implemented what you suggested, but now I see this error: "array_push() expects parameter 1 to be array, null given"; it repeats for every match in the source. Thanks for the tip about XML parsing--I'll look into that as I am able! – John Stephens Aug 31 '13 at 01:04
  • Did you declare `$endnotes` as an array (`private $endnotes = array()` or `private $endnotes = []` if you're using PHP >= 5.4)? Also did you replace the call to `$endnotes` with `$this->endnotes`? `array_push($this->endnotes, $matches[1]);` Any references to the `$endnotes` variable inside the class should be replaced with `$this->endnotes`. – ChicagoRedSox Aug 31 '13 at 01:45
2

Don't know about a simpler way, but you were halfway there. This seems to work.

I just cleaned it up a bit, moved the variables inside your class and added an output method to get the footnote list.

class Endnoter
{
    private $number_of_notes = 0;
    private $footnote_texts = array();

    public function replace($input) {

        return preg_replace_callback('#<span class="fnt">(.*)</span>#i', array($this, 'replace_callback'), $input);

    }

    protected function replace_callback($matches) {

        // the text sits in the matches array
        // see http://php.net/manual/en/function.preg-replace-callback.php
        $this->footnote_texts[] = $matches[1];

        return '<sup><a href="#endnote_'.(++$this->number_of_notes).'">'.$this->number_of_notes.'</a></sup>';

    }

    public function getEndnotes() {
        $out = array();
        $out[] = '<ol>';

        foreach($this->footnote_texts as $text) {
            $out[] = '<li>'.$text.'</li>';
        }

        $out[] = '</ol>';

        return implode("\n", $out);
    }

 }
Florian Grell
  • 995
  • 7
  • 18
  • Thanks, Florian! This is almost perfect. The only thing I can't figure out is how to add an id to each list item in the getEndnotes() function, so that the ids match the links established by the replace_callback(), like so:
  • , where "N" is the index + 1.
  • – John Stephens Aug 31 '13 at 01:34
  • Cancel that: I got it! `$i = 0; foreach($this->footnote_texts as $text) { $i++; $out[] = '
  • '.$text.'
  • '; }` – John Stephens Aug 31 '13 at 14:46