3

I have YAML Front Matter that I want to parse with PHP:

---
title = A nice title goes here
tags = tag1 tag2 tag3
---
This is the content of this entry...
Line2
Line3

I know it's about a Ruby gem of some kind, but I want use this in PHP to create a user-friendly flatfile blog engine.

I also have a snippet from a project called Phrozn. Maybe it can be handy for you guys to see it in order to help me with the problem as best as possible.

private function parse()
{
    if (isset($this->template, $this->frontMatter)) {
        return $this;
    }

    $source = $this->readSourceFile();

    $parts = preg_split('/[\n]*[-]{3}[\n]/', $source, 2);
    if (count($parts) === 2) {
        $this->frontMatter = Yaml::load($parts[0]);
        $this->template = trim($parts[1]);
    } else {
        $this->frontMatter = null;
        $this->template = trim($source);
    }

    return $this;
}
Matthieu Napoli
  • 48,448
  • 45
  • 173
  • 261
Robbie
  • 41
  • 1
  • 3
  • 1
    How is this different than [PHP YAML Parsers](http://stackoverflow.com/questions/294355/php-yaml-parsers)? – mu is too short Aug 13 '11 at 19:10
  • Because ordinary YAML parsers worn't allow multiple languages in the same file. – Robbie Aug 13 '11 at 21:07
  • So you want a little bit of YAML at the top and the rest as non-YAML? Sorry, I misunderstood your intent. What part of that `parse` function doesn't work? – mu is too short Aug 14 '11 at 00:59
  • That's okay. I think it's the regex format. It works fine when used in the other project (Phrozn), but I can't get it to work here. I am trying to split the document and get rid of the "\n"'s that shouldn't be there, so I end up with the two varibles that doesn't contain unnecessary spaces and newlines. Would I be better going with an alternative method? Please see this link: http://www.phrozn.info/en/documentation/front-matter/. If you want to help me make this open-source YAML Front-Matter snippet-project, maybe we can help those who had been looking for such solution, just like myself. :) – Robbie Aug 15 '11 at 15:10

3 Answers3

5

I think your problem is that you're trying to split the something with three parts into two parts. If you drop the third argument to preg_split, you'll get an array with three elements. The first piece of this (when delimited by ---):

---
title = A nice title goes here
tags = tag1 tag2 tag3
---
This is the content of this entry...
Line2
Line3

Is empty, the second is the YAML, and third is the content. Try this:

$parts = preg_split('/[\n]*[-]{3}[\n]/', $source, 3);

And a live test case: http://ideone.com/LYLxZ

If you want to match what Phrozn seems to be doing then your input would look like this:

title = A nice title goes here
tags = tag1 tag2 tag3
---
This is the content of this entry...
Line2
Line3

And your PHP would be this:

$parts = preg_split('/[\n]*[-]{3}[\n]/', $source, 2);

And a live test case for this version: http://ideone.com/a9a6C

mu is too short
  • 426,620
  • 70
  • 833
  • 800
  • Thank you so much for your answer! :) It works, but when I replace the current $source variable with "$source = file_get_contents("../entries/13-08-11-23-41-hashtag.txt");", then it doesn't work. I think it's because of the \n's. I also think this can be solved easily, but just can't figure it out. Do you have a quick solution on this? – Robbie Aug 16 '11 at 21:38
  • 1
    @Robbie: What's different about what `file_get_contents` returns? Is there maybe an end-of-line CR-LF issue? Sorry for the delays but at least I'm making up for the erroneous close-vote. – mu is too short Aug 16 '11 at 21:55
  • 2
    If you use the `PREG_SPLIT_NO_EMPTY` flag `preg_split` will match both of these yaml cases. – joemaller Dec 30 '12 at 08:23
1

I faced the same problem and was quite unhappy with an untested regex and the rare packages that were available.

I wrote a library (Composer, TDD, PSR-4) to handle that. The library also handles parsing the YAML and the Markdown: FrontYAML

The YAML and Markdown parser can be overridden. By default, Symfony YAML and Parsedown are used.

Matthieu Napoli
  • 48,448
  • 45
  • 173
  • 261
0

I did it thus:

// $string contains the full file.

$split = preg_split("/[\n]*[-]{3}[\n]/", $string, 3, PREG_SPLIT_NO_EMPTY);
try {
  // Strip extra, non-indentation, whitespace from beginning of lines
  $i = 0; $yfm = "";
  while ($split[0][$i] == " ") {$i++;}
  foreach(preg_split("/((\r?\n)|(\r\n?))/", $split[0]) as $line){
    $yfm .= substr($line, $i) . "\n";
  }
  // Using symfony's YAML parser
  $data = sfYaml::load($yfm);
} catch(InvalidArgumentException $e) {
  // This is not YAML
}

It removes the extraneous indentation one might get which would trip the parser, and it converts all newlines, whether they be Win (CRLF), Nix (LF), or Mac (CR), to just "\n".

Félix Saparelli
  • 8,424
  • 6
  • 52
  • 67