-1

I can't find the right keyword. I have this string :

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

<table border="2" cellpadding="5" cellspacing="0" style="width: 490px;">
    <tr>
        <th>Company</th>
        <th>Contact</th>
        <th>Country</th>
    </tr>
    <tr>
        <td>Alfreds Futterkiste</td>
        <td>Maria Anders</td>
        <td>Germany</td>
    </tr>
</table>

Duis consequat varius aliquam. In hac habitasse platea dictumst.

<table border="2" cellpadding="5" cellspacing="0" style="width: 490px;">
    <tr>
        <th>Company</th>
        <th>Contact</th>
        <th>Country</th>
    </tr>
    <tr>
        <td>Alfreds Futterkiste</td>
        <td>Maria Anders</td>
        <td>Germany</td>
    </tr>
</table>

What I want is :

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Duis consequat varius aliquam. In hac habitasse platea dictumst.

My attempt :

<table(.*)[^>]*>.*?

enter image description here

RegExr link.

This is for a script using perl that I'll make to remove table tags from a specific db table field. My attempt was targeting the table tag first and replace them with blank using perl.

  • 4
    I don't think Regex is the right tool for this. You should do that using some HTML parser – Youssef13 Mar 27 '20 at 08:35
  • 2
    [Parsing HTML with regex is a hard job](https://stackoverflow.com/a/4234491/372239) HTML and regex are not good friends. Use a parser, it is simpler, faster and much more maintainable. – Toto Mar 27 '20 at 14:28

2 Answers2

1

I'm not sure what you're trying to do, you say that you want to match the Lorem Ipsum, but your regex matches the HTML tags...

Anyways, here's some regex:

  • To match the <table>...</table>:
/<(table)[\s\S]*?<\/\1>/g
  • To match the Lorem Ipsum parts (or any line not beginning with a <):
/(?<=^|[\n\r])[^<\s].*(?=$|[\n\r])/g
Zorzi
  • 718
  • 4
  • 9
1

Don't spend your time too much with regexp. You can simply select the tables and their content and remove them from the actual string.

Use the following regex to select the tables first: <table.*>[\w\W]*?<\/table>. Playground.

Then use string.replace (or something similar) to remove the tables...

Demo:

const contentWrapper = document.getElementById('demo-content');
const content = contentWrapper.innerHTML;
contentWrapper.innerHTML = ''; // no need to display the HTML content here
const html = content.replace(/<table.*>[\w\W]*?<\/table>/g, '');
console.log(html)
<div id="demo-content">
Lorem ipsum dolor sit amet, consectetur adipiscing elit.

<table border="2" cellpadding="5" cellspacing="0" style="width: 490px;">
    <tr>
        <th>Company</th>
        <th>Contact</th>
        <th>Country</th>
    </tr>
    <tr>
        <td>Alfreds Futterkiste</td>
        <td>Maria Anders</td>
        <td>Germany</td>
    </tr>
</table>

Duis consequat varius aliquam. In hac habitasse platea dictumst.

<table border="2" cellpadding="5" cellspacing="0" style="width: 490px;">
    <tr>
        <th>Company</th>
        <th>Contact</th>
        <th>Country</th>
    </tr>
    <tr>
        <td>Alfreds Futterkiste</td>
        <td>Maria Anders</td>
        <td>Germany</td>
    </tr>
</table>
</div>
Kenan Güler
  • 1,868
  • 5
  • 16