How to extract from html and add a new line <h1> with that text in the body over multiple html files?</a></h1> </div> <div class="grid fw-wrap pb8 mb16 bb bc-black-075"> <div class="grid--cell ws-nowrap mr16 mb8" title="2016-01-12 19:07:53Z"> <span class="fc-light mr2">Asked</span> <time itemprop="dateCreated" datetime="2018-03-17T02:58:15.580" class="fromnow">Mar 17 '18 at 02:58</time> </div> <div class="grid--cell ws-nowrap mr16 mb8"> <span class="fc-light mr2">Active</span> <time class="fromnow" title="2018-03-21T22:58:26.390" datetime="2018-03-21T22:58:26.390">Mar 21 '18 at 22:58</a> </div> <div class="grid--cell ws-nowrap mb8" title="Viewed 144 times"> <span class="fc-light mr2">Viewed</span> 144 times </div> </div> <div id="mainbar" role="main" aria-label="questions and answers"> <div id="question" class="question" data-questionid="49332089" data-ownerid="609617" data-score="0"> <div class="post-layout"> <div class="votecell post-layout--left"> <div class="js-voting-container grid jc-center fd-column ai-stretch gs4 fc-black-200" data-post-id="49332089"> <button class="js-vote-up-btn grid--cell s-btn s-btn__unset c-pointer"><svg aria-hidden="true" class="m0 svg-icon iconArrowUpLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 26h32L18 10 2 26z"></path></svg></button> <div class="js-vote-count grid--cell fc-black-500 fs-title grid fd-column ai-center" itemprop="upvoteCount" data-value="0">0</div> <button class="js-bookmark-btn s-btn s-btn__unset c-pointer py4"> <svg aria-hidden="true" class="svg-icon iconBookmark" width="18" height="18" viewBox="0 0 18 18"><path d="M6 1a2 2 0 00-2 2v14l5-4 5 4V3a2 2 0 00-2-2H6zm3.9 3.83h2.9l-2.35 1.7.9 2.77L9 7.59l-2.35 1.7.9-2.76-2.35-1.7h2.9L9 2.06l.9 2.77z"></path></svg> <div class="js-bookmark-count mt4" data-value=""></div> </button> </div> </div> <div class="postcell post-layout--right"> <div class="s-prose js-post-body" itemprop="text"><p>I have a bunch of legacy html files where the the title is only stored in <code><title></code> metadata. I need to add a line in body with the metadata <code><title></code> as an <code><h1></code>. For example, extract what's in the <code><title>{COPY-THIS}</title></code> from the title and then add a new line at the beginning of the body with <code><h1>{COPY-THIS}</h1></code>. </p> <p>I think I need to use a regex to find what's in the title but not really sure how to do the bulk manipulation or even what code might work best for this conversion. </p> <p>I'm on Mac or Linux. What is a method to copy what's in the title of each file and add a new line in bulk? </p></div> <div class="mt24 mb12"> <div class="post-taglist grid gs4 gsy fd-column"> <div class="grid ps-relative"> <a href="../../questions/tagged/html" class="post-tag js-gps-track" title="show questions tagged 'html'" rel="tag">html</a> <a href="../../questions/tagged/regex" class="post-tag js-gps-track" title="show questions tagged 'regex'" rel="tag">regex</a> <a href="../../questions/tagged/converters" class="post-tag js-gps-track" title="show questions tagged 'converters'" rel="tag">converters</a> </div> </div> </div> <div class="mb0"> <div class="mt16 grid gs8 gsy fw-wrap jc-end ai-start pt4 mb16"> <div class="grid--cell mr16 fl1 w96"></div> <div class="post-signature grid--cell"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="edited Mar 21 '18 at 22:58">edited Mar 21 '18 at 22:58</time> <a href="../../users/17300/stephen-p" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/17300.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Stephen P" /> </a> <div class="s-user-card--info"> <a href="../../users/17300/stephen-p" class="s-user-card--link">Stephen P</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">14,422</li> <li class="s-award-bling s-award-bling__gold" title="2 gold badges">2</li> <li class="s-award-bling s-award-bling__silver" title="43 silver badges">43</li> <li class="s-award-bling s-award-bling__bronze" title="67 bronze badges">67</li> </ul> </div> </div> </div> <div class="post-signature owner grid--cell"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Mar 17 '18 at 02:58">asked Mar 17 '18 at 02:58</time> <a href="../../users/609617/markwk" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/609617.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="markwk" /> </a> <div class="s-user-card--info"> <a href="../../users/609617/markwk" class="s-user-card--link">markwk</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">135</li> <li class="s-award-bling s-award-bling__gold" title="1 gold badge">1</li> <li class="s-award-bling s-award-bling__silver" title="2 silver badge">2</li> <li class="s-award-bling s-award-bling__bronze" title="8 bronze badge">8</li> </ul> </div> </div> </div> </div> </div> </div> <div class="post-layout--right js-post-comments-component"> <div id="comments-49332089" class="comments js-comments-container bt bc-black-075 mt12 " data-post-id="49332089" data-min-length="15"> <ul class="comments-list js-comments-list" data-remaining-comments-count="0" data-canpost="false" data-cansee="true" data-comments-unavailable="false" data-addlink-disabled="true"> <li id="comment-85666146" class="comment js-comment " data-comment-id="85666146" data-comment-owner-id="460557" data-comment-score="0"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment85666146_49332089"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">Regex is not the proper tool to use. Use an HTML parser. Here, read this: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags</span> – <a href="../../users/460557/jorge-campos" title="22,647 reputation" class="comment-user ">Jorge Campos</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/49332089/how-to-extract-title-from-html-and-add-a-new-line-h1-with-that-text-in-the-bod#comment85666146_49332089"><span title="2018-03-17T02:59:53.940 License: CC BY-SA 3.0" class="relativetime-clean">Mar 17 '18 at 02:59</span></a></span> </div> </div> </li> <li id="comment-85666979" class="comment js-comment " data-comment-id="85666979" data-comment-owner-id="1237135" data-comment-score="0"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment85666979_49332089"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">Strictly speaking Jorge is correct, but you might get away with this approach https://unix.stackexchange.com/questions/78625/using-sed-to-find-and-replace-complex-string-preferrably-with-regex</span> – <a href="../../users/1237135/jim-w" title="4,866 reputation" class="comment-user ">Jim W</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/49332089/how-to-extract-title-from-html-and-add-a-new-line-h1-with-that-text-in-the-bod#comment85666979_49332089"><span title="2018-03-17T04:35:41.887 License: CC BY-SA 3.0" class="relativetime-clean">Mar 17 '18 at 04:35</span></a></span> </div> </div> </li> <li id="comment-85669683" class="comment js-comment " data-comment-id="85669683" data-comment-owner-id="609617" data-comment-score="0"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment85669683_49332089"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">Ok. Interesting points. I guess what I had in mind was using regex to extract or copy what's in the title tag and then find and replace the initial tag with something like <h1>title-extracted</h1>. So what you guys recommend is using an html parser instead? So maybe my question should be how to parse html to extract text from title and add additional html markup at the top of the body?</span> – <a href="../../users/609617/markwk" title="135 reputation" class="comment-user owner">markwk</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/49332089/how-to-extract-title-from-html-and-add-a-new-line-h1-with-that-text-in-the-bod#comment85669683_49332089"><span title="2018-03-17T08:38:36.417 License: CC BY-SA 3.0" class="relativetime-clean">Mar 17 '18 at 08:38</span></a></span> </div> </div> </li> </ul> </div> </div> </div> </div> <div id="answers"> <a name="tab-top"></a> <div id="answers-header"> <div class="answers-subheader grid ai-center mb8"> <div class="grid--cell fl1"> <h2 class="mb0" data-answercount="9">0 Answers<span style="display:none;" itemprop="answerCount">0</span></h2> </div> </div> </div> </div> </div> </div> </div> <script src="../../static/js/stack-icons.js"></script> <script src="../../static/js/fromnow.js"></script> </body> </html>