0

Need to go through WEBVTT(text file with simple vertical placement info for subtitle) and count #of lines and return value next to time stamp

I am new to this and tried regex find and replace but that didnt work

I have this content in a file with subtitles:

WEBVTT

00:00:00.000 --> 00:00:02.160
Hello World
I am James

00:00:02.185 --> 00:00:04.990
Welcome to my Show!

00:00:12.038 --> 00:00:14.530
This is our new season.
We hope you enjoy the show

00:00:19.580 --> 00:00:21.840
This is the first episode.

And would like the script to check each section with timestamp and return this:

WEBVTT

00:00:00.000 --> 00:00:02.160 align:middle line:84%
Hello World
I am James

00:00:02.185 --> 00:00:04.990 align:middle line:90%
Welcome to my Show!

00:00:12.038 --> 00:00:14.530 align:middle line:84%
This is our new season.
We hope you enjoy the show!

00:00:19.580 --> 00:00:21.840 align:middle line:90%
This is the first episode.

If there is single line THEN returned value next to the time stamp would be
align:middle line:90% Else align:middle line:84%

Mofi
  • 46,139
  • 17
  • 80
  • 143
  • How are you calculating 84% and 90%? or do they just alternate? – mbunch Jan 06 '19 at 06:24
  • Hi Matt: When it is 2 lines of subtitles, it is 84% (which is the vertical placement of the subtitles on the lower third of the screen) and if it is a single line then it is 90%. – MELL Clothing Jan 06 '19 at 06:32

2 Answers2

1

Windows command processor cmd.exe executing batch files is designed for running commands and applications. It is not designed for making modifications in text files. There are lots of other scripting languages which have features to easily modify text files like VBScript, JScript, PowerShell, Python, Perl, ... So the usage of Windows command processor for this task is the worst decision which can be made by a someone.

However, this is nevertheless an easy to achieve task with using JREPL.BAT written by Dave Benham which is a batch file / JScript hybrid to run a regular expression replace on a file using JScript.

@echo off
if not exist "%~dp0jrepl.bat" goto :EOF
if not exist "WEBVTT" goto :EOF

call "%~dp0jrepl.bat" "(\d{2}:\d{2}:\d{2}\.\d{3} --\> \d{2}:\d{2}:\d{2}\.\d{3})(\r?\n[^\r\n]+\r?\n[^\r\n])" "$1 align:middle line:84%%$2" /M /F "WEBVTT" /O -
call "%~dp0jrepl.bat" "(\d{2}:\d{2}:\d{2}\.\d{3} --\> \d{2}:\d{2}:\d{2}\.\d{3})(\r?\n[^\r\n])" "$1 align:middle line:90%%$2" /M /F "WEBVTT" /O -

The batch file first checks if there is a file named WEBVTT in current directory and immediately exits if this condition is not true, see Where does GOTO :EOF return to?

The batch file JREPL.BAT must be stored in same directory as the batch file with the code above. For that reason the batch file checks next if JREPL.BAT really exists in directory of the batch file and exits if this condition is not true.

Then JREPL.BAT is used twice to run two regular expression replaces on file WEBVTT to change its content to required format.

Let's look on first search expression:

(\d{2}:\d{2}:\d{2}\.\d{3} --\> \d{2}:\d{2}:\d{2}\.\d{3})(\r?\n[^\r\n]+\r?\n[^\r\n])

(...) ... defines a marking group. The string found by the expression inside this first marking group is back-referenced with $1 in replace string to keep this part of found and matched string unmodified. The regular expression inside the marking group is used to definitely find a line with a string like 00:00:02.185 --> 00:00:04.990 anywhere inside the line.

\d{2} ... means that exactly two digits must be found for a positive match.

\. ... the dot means any character and therefore must be escaped with a backslash to be interpreted as literal character.

\d{3} ... means that exactly three digits must be found for a positive match.

\> ... also > needs to be escaped with a backslash to be interpreted as literal character.

(...) ... defines a second marking group. The string found by the expression inside this second marking group is back-referenced with $2 in replace string to keep this part of found string also unmodified.

\r?\n ... there must be a line-feed with optionally a carriage return before after for example 00:00:02.185 --> 00:00:04.990.

[^\r\n]+ ... the next line must have one or more characters not being a carriage return or a line-feed. So the search expression is negative on next line being an empty line. Please note that a line containing just spaces/tabs is not an empty line. It is a blank line because of containing only white space characters, but it is not an empty line.

\r?\n[^\r\n] ... and finally there must be one more DOS/Windows or UNIX line ending and next line must contain also a character for a positive match.

So the first search expression matches lines with subtitles with two lines.

Therefore the first replace string contains 84% whereby the percent sign must be escaped with one more percent sign because otherwise the Windows command processor would interpret % as beginning of an environment variable or batch file argument reference.

The second search expression is similar to first one, but is only positive on a line with the time values not having already more text after second time value and there is one more line below which is not an empty line.

Both regular expressions are multi-line expressions which require JREPL option /M.

For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.

  • call /? ... explains also %~dp0 ... drive and path of argument 0 being the batch file itself.
  • echo /?
  • goto /?
  • if /?
  • jrepl.bat /?
Mofi
  • 46,139
  • 17
  • 80
  • 143
1

This problem may be solved in a relatively simple way in a Batch file (or any other programming language), using the proper method:

EDIT 2019-01-07: Problem with exclamation marks fixed

@echo off
setlocal DisableDelayedExpansion

(

echo WEBVTT
echo/
set "i=0"
for /F "skip=2 tokens=1* delims=:" %%a in ('findstr /N "^" input.txt') do (
   set /A i+=1
   call set "line[%%i%%]=%%b"
   if "%%b" equ "" (
      set /A "line=90-!!(i-3)*6"
      setlocal EnableDelayedExpansion
      echo !line[1]! align:middle line:!line!%%
      for /L %%i in (2,1,!i!) do echo(!line[%%i]!
      endlocal
      set "i=0"
   )
)
set /A "line=90-!!(i-2)*6"
setlocal EnableDelayedExpansion
echo !line[1]! align:middle line:!line!%%
for /L %%i in (2,1,!i!) do echo(!line[%%i]!

) > output.txt

Lines are read and stored in an array until an empty line appears. At that moment all lines are output and the additional data is added to first line. The somewhat strange arithmetic expression just subtract 6 from 90 when the number of lines in the section is 3 (including the last empty line). For further details, see SET /?.

Aacini
  • 65,180
  • 12
  • 72
  • 108
  • @Mofi: Ops... You are right! **`:(`** However, this problem is easily solved toggling the delayed expansion in the usual way. See my edit... – Aacini Jan 07 '19 at 12:19