1

I am using the last command from this SO answer https://stackoverflow.com/a/54818581/80353

cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'|tee cap)

What this command currently do

  1. This command will download captions for a youtube video as a .vtt file and
  2. then print out on the terminal the simplified version of the .vtt file

This command works as described.

How to use this command

In the terminal I will run the above command once and then run cap $youtube_url

What I like to have

I would like to modify the original cap() function so that the original behavior remains with one extra part

  1. This command will download captions for a youtube video as a .vtt file (unchanged)
  2. then print out the simplified version of the .vtt file into another file that's stated as parameter $2 (changed)

How I expect to call the new command

Originally, I would call the original command as

cap $youtube_url

Now I like to do this

cap $youtube_url $relative_or_absolute_path_of_text_or_markdown_file

How do I modify the original cap command to achieve the outcome I want?

Kim Stacks
  • 10,202
  • 35
  • 151
  • 282
  • Could you please do let us know the complete requirement of your code, as this looks very complex and may lead to confusions. Kindly do add more information in your question and let us know then, cheers. – RavinderSingh13 Dec 09 '19 at 07:09
  • @RavinderSingh13 Oh I wasn't aware that this wasn't clear enough. Let me try again. Is this better? – Kim Stacks Dec 09 '19 at 07:17
  • 2
    Did you try `... | tee "$2"` instead of `... | tee cap`? – oguz ismail Dec 09 '19 at 07:23
  • 1
    @KimStacks, Could you please try following once `cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'|tee -a "$2")`? And lemme know then. This should show output on screen as well as save into output file too. – RavinderSingh13 Dec 09 '19 at 07:24

3 Answers3

2

Considering that you want to see output on screen as well as you want to save output into a output file too, if this is the case could you please try following.

cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'|tee -a "$2")

OR in non-one liner form use:

cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";\
sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'\
|tee -a "$2")

Please make sure that you have provided complete path in your variable eg--> relative_or_absolute_path_of_text_or_markdown_file="/full/path/output_file.txt" etc just an example. I couldn't test it since I don't have mechanism for vtt files etc in my box.

In case you don't want to print information on screen and simply want to save output into output file then as @oguz ismail's comment use only tee "$2" not tee -a "$2" as I shown above.

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • 1
    It works. And yes I need to state full path for it to work. cannot work with relative path. I have an additional request but was wondering if it should be a separate question. I would like to be able to create the new file if it doesn't exist and that the relative path can work. Would prefer to use SO the right way. So should I repost as new question or continue with this question? And many thanks – Kim Stacks Dec 10 '19 at 03:10
  • 1
    @KimStacks, your welcome, IMHO, looks like a completely new query, you can create a new question for it, cheers. – RavinderSingh13 Dec 10 '19 at 03:12
1

Here's a detailed bash script for those who wants to save the subs file with a relative path.

The result is saved as plaintext, removing time, new lines and other markup.

#!/bin/bash
# video-cap.sh videoUrl sub.txt

# Download captions only and save in a .vtt file
youtube-dl --skip-download --write-auto-sub "$1";

# Find .vtt files in current directory created within last 3 seconds, limit to 1
vtt=$(find . -cmin -0.05 -name "*.vtt" | head -1)

# Extract the subs and save as plaintext, removing time, new lines and other markup
sed '1,/^$/d' "$vtt" \
  | sed 's/<[^>]*>//g' \
  | awk -F. 'NR%8==1{$1}NR%8==3' \
  | tr '\n' ' ' > "$2"

# Remove the original .vtt subs file
rm -f "$vtt"
Alex
  • 407
  • 4
  • 11
0

Thank You @KimStacks @RavinderSingh13 @Oguz-Ismail for posting these solutions above and in the previous post

I managed to get results in the .vtt file with youtube-dl --skip-download --write-auto-sub $youtube_url

However, the format of the output is not ideal for my purpose. I have to delete line by line in order to remove the time as well as the /n new line. So I would like to customize the code syntax to fit my requirements.

NOTE: Not sure whether it's a new query or not, so I will post it here for now:

  1. I have tried all the steps suggested in previous post and here as well but I still can not understand:
  • How to insert the "$youtube_url" inside the code below?

    cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";\
    sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'\
    |tee -a "$2")
    
  1. I tried editing the numbers from 0 to 3 to -1 in 'NR%8==1{printf"%s ",$1}NR%8==3', on both ends but not successfully getting the right format inside the .vtt file. Thus, Is it possible to have:
  • transcripted text printed continously as sentences, rather than each subtitle printed as new lines?

  • remove printout of start time?

SilverNak
  • 3,283
  • 4
  • 28
  • 44
robde
  • 1
  • I'm not going to downvote on this, so do consider what I wrote in my earlier comment and change accordingly. Hopefully, you make the appropriate changes within the next couple of days – Kim Stacks Dec 12 '19 at 05:47