0

I have an issue parsing the dom elements when text contains something like below. I wanted to remove highligted text from actual using Javascript. Can you please help me on this. I want to depend on regular expressions on the same.

I know how to get the quoted attributes using standard string functions and also using dom parser.

For the nodes like below, using string functions such as replace, slice may work but I need to traverse thru entire string. Which is performance issue.

So I wanted to go with regular expressions to find such attributes in a node.

    <p class=MsoListParagraphCxSpFirst style='text-indent:-.25in;mso-list:l0 level1 lfo1'>

In the above example I want to remove class attribute and class name could be anything. These nodes are generated from MS word and are not in my control.

EDIT: Following is the pattern I am using to search unquoted text. But it is not working

var pattern = /<p class=\s*=\s*([^" >]+)/im
Sudhakar Chavali
  • 809
  • 2
  • 14
  • 32
  • 1
    The posted question does not appear to include [any attempt](https://idownvotedbecau.se/noattempt/) at all to solve the problem. StackOverflow expects you to [try to solve your own problem first](https://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users), as your attempts help us to better understand what you want. Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a [MCVE]. For more information, please see [ask] and take the [tour]. – CertainPerformance Oct 24 '18 at 01:48
  • 1
    I don't recommend using MS Word to create web pages. – Jonathan Rys Oct 24 '18 at 01:49
  • I am not using the MS word to create web pages. We have a solution where users can copy data from MS word document to Rich text editor we use in our application. However format of the text doesn't appear good in chrome browsers. And hence we decided to parse the text to standard html form and display in editor. We are developing the solution similar to https://www.tiny.cloud/docs/demo/full-featured/ and code sample is @ https://github.com/tinymce/tinymce/blob/master/src/core/main/ts/api/html/DomParser.ts – Sudhakar Chavali Oct 24 '18 at 01:53
  • @CertainPerformance Usually when I don't have answers, I come here. Otherwise I solve the problems on my own. – Sudhakar Chavali Oct 24 '18 at 02:05
  • You're not expected to *have* the answer, of course, but you *are* expected to at least *make a slight attempt* before asking. See [How much research effort is expected of Stack Overflow users?](https://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users) – CertainPerformance Oct 24 '18 at 02:08

1 Answers1

1

Regex101 Example

Regex:
\S+?=[^'"]\S*[^'"\s]

the tricky part with this one is finding the end of the unquoted attribute, in this example i'm assuming it will not contain any white space characters, so I can use the first occurrence of white space to terminate the match

doom87er
  • 458
  • 2
  • 8