0

Need a help in figuring out the regular expression where I need to remove all the data between {{ and }}?

Below is the coupus:

{{for|the American actor|Russ Conway (actor)}}
{{Use dmy dates|date=November 2012}}
{{Infobox musical artist <!-- See Wikipedia:WikiProject_Musicians -->
| birth_name          = Trevor Herbert Stanford
| birth_date          = {{birth date|1925|09|2|df=y}}
| birth_place         = [[Bristol]], [[England]], UK
| death_date          = {{death date and age|2000|11|16|1925|09|02|df=y}}
| death_place         = [[Eastbourne]], [[Sussex]], England, UK
| origin              = 
}}

record|hits]].<ref name="British Hit Singles & Albums"/>

{{reflist}}

==External links==
*[http://www.russconway.co.uk/ Russ Conway]
*{{YouTube|TnIpQhDn4Zg|Russ Conway playing Side Saddle}}

{{Authority control|VIAF=41343596}}

<!-- Metadata: see [[Wikipedia:Persondata]] -->
{{Persondata
| NAME              =Conway, Russ
}}
{{DEFAULTSORT:Conway, Russ}}
[[Category:1925 births]]

Below is the output with all the curly braces are removed along with the text within it:

record|hits]].<ref name="British Hit Singles & Albums"/>
==External links==
*[http://www.russconway.co.uk/ Russ Conway]
*
<!-- Metadata: see [[Wikipedia:Persondata]] -->
[[Category:1925 births]]

P.S - I have omitted the space in the output, I will take care of that.

Ankit
  • 201
  • 4
  • 12

2 Answers2

0
string.replaceAll("\\{\\{[\\s\\S]*?\\}\\}","");

will produce:





record|hits]].<ref name="British Hit Singles & Albums"/>



==External links==
*[http://www.russconway.co.uk/ Russ Conway]
*



<!-- Metadata: see [[Wikipedia:Persondata]] -->


[[Category:1925 births]]
justhalf
  • 8,960
  • 3
  • 47
  • 74
  • Hi, the above expression won't take care of the nested braces...for eg in this block: {{Infobox musical artist <!-- See Wikipedia:WikiProject_Musicians --> | birth_name = Trevor Herbert Stanford | birth_date = {{birth date|1925|09|2|df=y}} | birth_place = [[Bristol]], [[England]], UK | death_date = {{death date and age|2000|11|16|1925|09|02|df=y}} | death_place = [[Eastbourne]], [[Sussex]], England, UK | origin = }} it will miss out data after | birth_place ... – Ankit Sep 20 '13 at 08:02
  • You can't parse nested expression using Regex, because it's not regular. It's similar to parsing HTML using Regex, which is discussed in this post: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – justhalf Sep 20 '13 at 08:07
0

This would take care of nested {{ }}

Matcher m=Pattern.compile("\\{[^{}]*\\}").matcher(input);
while(m.find())
{
    input=m.replaceAll("");
    m.reset(input);
}
Anirudha
  • 32,393
  • 7
  • 68
  • 89
  • Though it worked, I am not able to understand how are you comparing the index? – Ankit Sep 20 '13 at 08:18
  • `[^{}]*` would match 0 to many characters which are not `{`,`}`..So if you have nested `{{ }}`,that would only remove non nested `{{}}`...So we need to keep on replacing till we don't have `{{` in the string – Anirudha Sep 20 '13 at 08:20
  • Because of Texts like these {{cite news |title=Украинцы планируют убийства в Грузии? |url=km.ru/magazin/view.asp?id={5CE15A8F-9F1E-4C36-A007-0C818963B6CD} |work= |publisher=KMnews.RU |date=13 August 2008 |language=Russian |accessdate=4 December 2008 }} are going into infinite loop. It suppose the pattern {{{}}} is not matched with this @Anirudh – Ankit Oct 01 '13 at 04:13
  • @Kailash ur input doesn't have }} and hence it would go in loop..if }} is not present what would you like to do with the input – Anirudha Oct 01 '13 at 05:11
  • I need to remove nested {{ }} or { } if present..else i need to move on! @Anirudh – Ankit Oct 01 '13 at 07:01
  • can you help me with a similar question? http://stackoverflow.com/questions/20041891/regular-expression-to-extract-a-section-from-the-wikipedia-page @Anirudh – Ankit Nov 18 '13 at 07:46