4
1
00:00:00,074 --> 00:00:02,564
Previously on Breaking Bad...

2
00:00:02,663 --> 00:00:04,393
Words...

i need to parse srt files with php and print the all subs in the file with variables.

i couldn't find the right reg exps. when doing this i need to take the id, time and the subtitle variables. and when printing there musn't be no array() s or etc. must print just the same as in the orginal file.

i mean i must print like;

$number <br> (e.g. 1)
$time <br> (e.g. 00:00:00,074 --> 00:00:02,564)
$subtitle <br> (e.g. Previously on Breaking Bad...)

by the way i have this code. but it doesn't see the lines. it must be edited but how?

$srt_file = file('test.srt',FILE_IGNORE_NEW_LINES);
$regex = "/^(\d)+ ([\d]+:[\d]+:[\d]+,[\d]+) --> ([\d]+:[\d]+:[\d]+,[\d]+) (\w.+)/";

foreach($srt_file as $srt){

    preg_match($regex,$srt,$srt_lines);

    print_r($srt_lines);
    echo '<br />';

}
user1447165
  • 87
  • 1
  • 2
  • 8
  • seem there is already some material on this problem on the web, libraries like https://github.com/delphiki/SubRip-File-Parser, you might want to avoid to reinvent the wheel. google "parse srt file php" if you doubt ;). – jolivier Jul 25 '12 at 21:50
  • i googled it many times. no good results. some results work but doesn't print all subtitles and shows array() things. by the way this creates a new srt files etc. this doesn't do what i want i think? i must print the whole content of the srt file with php as exactly same as in the srt file. but while doing this i must have a loop and the 3 variables to make some work. – user1447165 Jul 25 '12 at 21:54
  • the library I linked transforms the srt file into objects so from this starting point you can do whatever you want with these objects, the code on the home page is just a sample of what you can do. – jolivier Jul 25 '12 at 22:00
  • it's now **incredibly easy to do this**. I put in an answer for 2021 – Fattie Apr 11 '21 at 17:02

7 Answers7

12

Here is a short and simple state machine for parsing the SRT file line by line:

define('SRT_STATE_SUBNUMBER', 0);
define('SRT_STATE_TIME',      1);
define('SRT_STATE_TEXT',      2);
define('SRT_STATE_BLANK',     3);

$lines   = file('test.srt');

$subs    = array();
$state   = SRT_STATE_SUBNUMBER;
$subNum  = 0;
$subText = '';
$subTime = '';

foreach($lines as $line) {
    switch($state) {
        case SRT_STATE_SUBNUMBER:
            $subNum = trim($line);
            $state  = SRT_STATE_TIME;
            break;

        case SRT_STATE_TIME:
            $subTime = trim($line);
            $state   = SRT_STATE_TEXT;
            break;

        case SRT_STATE_TEXT:
            if (trim($line) == '') {
                $sub = new stdClass;
                $sub->number = $subNum;
                list($sub->startTime, $sub->stopTime) = explode(' --> ', $subTime);
                $sub->text   = $subText;
                $subText     = '';
                $state       = SRT_STATE_SUBNUMBER;

                $subs[]      = $sub;
            } else {
                $subText .= $line;
            }
            break;
    }
}

if ($state == SRT_STATE_TEXT) {
    // if file was missing the trailing newlines, we'll be in this
    // state here.  Append the last read text and add the last sub.
    $sub->text = $subText;
    $subs[] = $sub;
}

print_r($subs);

Result:

Array
(
    [0] => stdClass Object
        (
            [number] => 1
            [stopTime] => 00:00:24,400
            [startTime] => 00:00:20,000
            [text] => Altocumulus clouds occur between six thousand
        )

    [1] => stdClass Object
        (
            [number] => 2
            [stopTime] => 00:00:27,800
            [startTime] => 00:00:24,600
            [text] => and twenty thousand feet above ground level.
        )

)

You can then loop over the array of subs or access them by array offset:

echo $subs[0]->number . ' says ' . $subs[0]->text . "\n";

To show all subs by looping over each one and displaying it:

foreach($subs as $sub) {
    echo $sub->number . ' begins at ' . $sub->startTime .
         ' and ends at ' . $sub->stopTime . '.  The text is: <br /><pre>' .
         $sub->text . "</pre><br />\n";
}

Further reading: SubRip Text File Format

drew010
  • 68,777
  • 11
  • 134
  • 162
  • this is quite good. but shows in array appearance. i tried your last code it showed only the first subtitle. everything is ok. but how can i print all subtitles like a srt file's view? i mean line by line. and without other function thins. – user1447165 Jul 25 '12 at 22:05
  • Yes it creates an array where each entry in the array is one entry in the SRT file. I'll edit the post in a moment. – drew010 Jul 25 '12 at 22:07
  • @user1447165 See additional foreach loop at the bottom now. This loops over each one and displays it. – drew010 Jul 25 '12 at 22:10
  • ok. there is no catching problems in this. thank you so much. – user1447165 Jul 25 '12 at 22:13
  • This does not add the final $sub to the array unless there is _two_ empty lines at the end of the file. I would add `$lines[] = '';` after `$lines =` as a workaround. – WTPK Jun 11 '17 at 12:27
  • @WTPK Thanks for finding and pointing that out. I edited the code to fix the issue. – drew010 Jun 16 '17 at 03:20
  • This is wildly overcomplicated and unreliable. You can now do the whole thing with 2 lines of code guys. Programming has marched on in 10 years. – Fattie Apr 11 '21 at 17:03
2

Group the file() array into chunks of 4 using array_chunk(), then omit the last entry, since it's a blank line like this:

foreach( array_chunk( file( 'test.srt'), 4) as $entry) {
    list( $number, $time, $subtitle) = $entry;
    echo $number . '<br />';
    echo $time . '<br />';
    echo $subtitle . '<br />';
}
nickb
  • 59,313
  • 13
  • 108
  • 143
  • The only reason this wont work is because the SubRip format says you can have 1 or more lines of text, terminated by an empty line. – drew010 Jul 25 '12 at 22:02
  • Thanks for the info drew - Wouldn't have known that from the OP's sample. – nickb Jul 25 '12 at 22:04
  • by the way i noticed something. it doesn't catch some subtitle's ids, texts and times. why'd that happen? – user1447165 Jul 25 '12 at 22:10
  • I like the solution though, much shorter than mine. I suppose you could get tricky and check if the last entry is not a blank line, you could pop an element off the array inside the loop until you do hit the blank line and re-align yourself with the subtitle #. – drew010 Jul 25 '12 at 22:13
1

That is not going to match because your $srt_file array might look like this:

Array
([0] => '1',
[1] => '00:00:00,074 --> 00:00:02,564',
[2] => 'Previously on Breaking Bad...'.
[3] => '',
[4] => '2',
...
)

Your regex isn't going to match any of those elements.

If your intent is to read the entire file into one long memory-hog-of-a-string then use file_get_contents to get the entire file contents into one string. then use a preg_match_all to get all the regex matches.

Otherwise you might try to loop through the array and try to match various regex patterns to determine if the line is an id, a time range, or text and do thing appropriately. obviously you might also want some logic to make sure you are getting values in the right order (id, then time range, then text).

Mike Brant
  • 70,514
  • 10
  • 99
  • 103
0

I made a class to convert a .srt file to array. Each entry of the array has the following properties:

  • id: a number representing the id of the subtitle (2)
  • start: float, the start time in seconds (24.443)
  • end: float, the end time in seconds (27.647)
  • startString: the start time in human readable format (00:00:24.443)
  • endString: the end time in human readable format (00:00:24.647)
  • duration: the duration of the subtitle, in ms (3204)
  • text: the text of the subtitle (the Peacocks ruled over Gongmen City.)

The code is php7:

<?php

namespace VideoSubtitles\Srt;


class SrtToArrayTool
{


    public static function getArrayByFile(string $file): array
    {

        $ret = [];

        $gen = function ($filename) {
            $file = fopen($filename, 'r');
            while (($line = fgets($file)) !== false) {
                yield rtrim($line);
            }
            fclose($file);
        };

        $c = 0;
        $item = [];
        $text = '';
        $n = 0;
        foreach ($gen($file) as $line) {

            if ('' !== $line) {
                if (0 === $n) {
                    $item['id'] = $line;
                    $n++;
                }
                elseif (1 === $n) {
                    $p = explode('-->', $line);
                    $start = str_replace(',', '.', trim($p[0]));
                    $end = str_replace(',', '.', trim($p[1]));
                    $startTime = self::toMilliSeconds(str_replace('.', ':', $start));
                    $endTime = self::toMilliSeconds(str_replace('.', ':', $end));
                    $item['start'] = $startTime / 1000;
                    $item['end'] = $endTime / 1000;
                    $item['startString'] = $start;
                    $item['endString'] = $end;
                    $item['duration'] = $endTime - $startTime;
                    $n++;
                }
                else {
                    if ($n >= 2) {
                        if ('' !== $text) {
                            $text .= PHP_EOL;
                        }
                        $text .= $line;
                    }
                }
            }
            else {
                if (0 !== $n) {
                    $item['text'] = $text;
                    $ret[] = $item;
                    $text = '';
                    $n = 0;
                }
            }
            $c++;
        }
        return $ret;
    }


    private static function toMilliSeconds(string $duration): int
    {
        $p = explode(':', $duration);
        return (int)$p[0] * 3600000 + (int)$p[1] * 60000 + (int)$p[2] * 1000 + (int)$p[3];
    }


}

Or check it out here: https://github.com/lingtalfi/VideoSubtitles

ling
  • 9,545
  • 4
  • 52
  • 49
  • This is wildly overcomplicated and unreliable. You can now do the whole thing with 2 lines of code guys. – Fattie Apr 11 '21 at 17:04
0

You can use this project: https://github.com/captioning/captioning

Sample code:

<?php
require_once __DIR__.'/../vendor/autoload.php';

use Captioning\Format\SubripFile;

try {
    $file = new SubripFile('your_file.srt');

    foreach ($file->getCues() as $line) {
        echo 'start: ' . $line->getStart() . "<br />\n";
        echo 'stop: ' . $line->getStop() . "<br />\n";
        echo 'startMS: ' . $line->getStartMS() . "<br />\n";
        echo 'stopMS: ' . $line->getStopMS() . "<br />\n";
        echo 'text: ' . $line->getText() . "<br />\n";
        echo "=====================<br />\n";
    }

} catch(Exception $e) {
    echo "Error: ".$e->getMessage()."\n";
}

Sample output:

> php index.php
start: 00:01:48,387<br />
stop: 00:01:53,269<br />
startMS: 108387<br />
stopMS: 113269<br />
text: ┘ç┘à╪د┘ç┘┌»█î ╪▓█î╪▒┘┘ê█î╪│ ╪ذ╪د ┌ر█î┘█î╪ز ╪ذ┘┘ê╪▒█î ┘ê ┌ر╪»┌ر x265
=====================<br />
start: 00:02:09,360<br />
stop: 00:02:12,021<br />
startMS: 129360<br />
stopMS: 132021<br />
text: .┘à╪د ┘╪ذ╪د┘è╪» ╪ز┘┘ç╪د┘è┘è ╪د┘è┘╪ش╪د ╪ذ╪د╪┤┘è┘à -
┌╪▒╪د ╪ا<br />
=====================<br />
start: 00:02:12,022<br />
stop: 00:02:14,725<br />
startMS: 132022<br />
stopMS: 134725<br />
text: ..╪د┌»┘ç ┘╛╪»╪▒╪ز -
.╪د┘ê┘ ┘ç┘è┌┘ê┘é╪ز ┘à╪ز┘ê╪ش┘ç ╪▒┘╪ز┘┘à┘ê┘ ┘┘à┘è╪┤┘ç -<br />
=====================<br />
Nabi K.A.Z.
  • 9,887
  • 6
  • 59
  • 81
0

it can be done by using php line-break. I could do it successfully let me show my code

$srt=preg_split("/\\r\\n\\r\\n/",trim($movie->SRT));
            $result[$i]['IMDBID']=$movie->IMDBID;
            $result[$i]['TMDBID']=$movie->TMDBID;

here $movie->SRT is the subtitle of having format u posted in this question. as we see, each time space is two new line, hope u getting answer.

BaiMaoli
  • 168
  • 2
  • 15
-1

Simple, natural, trivial solution

srt subs look like this, and are separated by two newlines:

3
00:00:07,350 --> 00:00:09,780
The ability to destroy a planet is
nothing next to the power of the force

Obviously you want to parse the time, using dateFormat.parse which already exists in Java, so it is instant.

class Sub {
    float start;
    String text;

    Sub(String block) {
        this.start = null; this.text = null;
        String[] lines = block.split("\n");
        if (lines.length < 3) { return; }

        String timey = lines[1].replaceAll(" .+$", "");
        try {
            DateFormat dateFormat = new SimpleDateFormat("HH:mm:ss,SSS");
            Date zero = dateFormat.parse("00:00:00,000");
            Date date = dateFormat.parse(timey);
            this.start = (float)(date.getTime() - zero.getTime()) / 1000f;
        } catch (ParseException e) {
            e.printStackTrace();
        }

        this.text = TextUtils.join(" ", Arrays.copyOfRange(lines, 2, lines.length) );
    }
}

Obviously, to get all the subs in the file

    List<Sub> subs = new ArrayList<>();
    String[] tt = fileText.split("\n\n");
    for (String s:tt) { subs.add(new Sub(s)); }
Fattie
  • 27,874
  • 70
  • 431
  • 719