0

Assume we have text such as the following.

Title: (some text)
My Title [abc]

Content: (some test)
My long content paragraph. With multiple sentences. [abc]

Short Content: (some text)
Short content [abc]

Using Javascript and RegEx, is it possible to extract the text so that it would be as follows.

Title: My Title
Content: My long content paragraph. With multiple sentences.
Short Content: Short content

Basically ignoring new lines and text in the () and [] brackets?

I've tried to use Regex but I can't get it to do exactly as I'd like. I'm also getting the issue that when I match Content: i'm getting a match for both Content: & Short Content: however i'd want to only match the occurrence where it is an exact match.

EDIT:

I'm new to RegEx. So far to extract the titles such as Title:, Content: and so on I have

/[A-Za-z]+:|[A-Za-z]+ [A-Za-z]+:|[A-Za-z]+ [A-Za-z]+ [A-Za-z]+:|[A-Za-z]+ [A-Za-z]+ [0-9]+:/g

And then I loop through and use this

[TITLENAME]:.*\n.*

I'm struggling to get past this. My next step would be to loop through the text that is matched above and then remove the bracket stuff. I'm sure there is a better way to do this!

KGandhi
  • 1
  • 1

2 Answers2

0

You could use String.replace( /(\(|\)|\[|\])/g , '')

If you take a string and use the replace method with these two arguments it will return a string with the ()[] characters removed. I have escaped them all with \ since they are special characters in regex. It might be a little over zealous.

Also g makes the regular expression global so it will remove all instances

Moti Korets
  • 3,738
  • 2
  • 26
  • 35
Jake C
  • 136
  • 5
  • Actually I misunderstood your question. It should be possible to use string.replace and another regex expression to achieve the result you want though – Jake C Apr 16 '18 at 15:44
  • `.replace( /(\([^)]*\)|\[[^\]]\])/g, ' ')` So this removes everything in () or [] as well as the brackets and replaces them with a space – Jake C Apr 16 '18 at 15:50
  • And finally `.replace( /(\([^)]*\)|\[[^\]]*\])\n?/g, ' ')` will hopefully actually work on square brackets unlike my previous version and will remove the new line character after the bracketed text – Jake C Apr 16 '18 at 16:01
  • If this is the right idea let me know if any issues show up. I will have a go at correcting them later – Jake C Apr 16 '18 at 16:05
  • Thanks, looking into it now! – KGandhi Apr 16 '18 at 16:14
0

If the text within parenthesis (e.g. 'abc') is fixed and have a special meaning you can also go with: '/(\(some text\)\n|\(some test\)\n|(\[abc\]))|(^$\n)/gm'. This way you would allow parenthesis in the real text that you want to preserve, e.g. some text (this I want to preserve) and other text.

Please note the multiline m flag.

https://regex101.com/r/cS3pRR/1

Luca Abbati
  • 542
  • 5
  • 14