Regular expression for getting specific data

Question

I have a file that can be read as a text box, I would like to get only the data available after

start="n= and end="n=

 <?xml version="1.0" encoding="utf-8"?>
 <!DOCTYPE smil PUBLIC "-//W3C//DTD SMIL 1.0//EN" "SMIL10.dtd">
 <head>
 </head>
     <body>
            <audio start="n=10.815s" end="n=19.914s"/>
 </body>
</xml>

I tried doing the following :

   String startTime = readString.replaceAll(".*start=\"n=|\\s.*", "").trim();
   String endTime = readString.replaceAll(".*end=\"n=|\\s.*", "").trim();
   Log.e("Start Time is :" , startTime);
   Log.e("endTime Time is :" , endTime);

Its working fine, with just getting the start time and end time but it also shows the <?xml tag.

How do I fix this?

use the right tool for the right job. So here a XML/HTML parser would come in handy, not a regex. — jlordo, Dec 17 '12 at 10:45
Thanks. Its not an xml file, its a text file with tags. I am able to view this on text box. — Adarsh H S, Dec 17 '12 at 10:56

score 3 · Answer 1 · edited May 23 '17 at 12:04

3

I would rather use an XML parser to read this. Regexps aren't suited to parsing XML/HTML etc. You'll find numerous references in SO relating to this.

For Java, DOM and SAX are possibilities, but JDOM might make an easier starting point.

edited May 23 '17 at 12:04

Community

1
1

answered Dec 17 '12 at 10:45

Brian Agnew

268,207
37
334
440

Thanks. Its not an xml file, its a text file with tags. I am able to view this on text box. – Adarsh H S Dec 17 '12 at 10:55
It looks like an XML file. Why does it not conform ? – Brian Agnew Dec 17 '12 at 11:44

Santhosh Gutta · Accepted Answer · 2012-12-17T14:44:20.373

Please find the solution below in Java, this works for any data that contains the string

<audio start="n=........" end="n=......." />

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
public static void main(String[] args) 
{
String inputData1 = "<?xml version=\"1.0\" encoding=\"utf-8\"?>"+
                        "<!DOCTYPE smil PUBLIC \"-//W3C//DTD SMIL 1.0//EN\" \"SMIL10.dtd\">"
                        + "<head>" 
                        + "</head>" 
                        + "<body>"
                        + "<audio start=\"n=10.815s\" end=\"n=19.914s\"/>"
                        + "<sometag> <audio start=\"n=10.815s\" end=\"n=20.914s\"/> </sometag>"
                        + "</body>"
                        + "</xml>";

    String inputData2 = "some data goes here with or without tags; <audio start=\"n=10.815s\" end=\"n=20.914s\"/>; askjdhfla ";

    Pattern pattern = Pattern.compile("<audio[^>]*start\\s*=\\s*\"n\\s*=\\s*([^\"]*)\"[^>]*end=\"n\\s*=\\s*([^\"]*)\"[^>]*>");
    Matcher matcher = pattern.matcher(inputData1);

    while(matcher.find()){
        System.out.println("start=\"n="+matcher.group(1)+", & end=\"n="+matcher.group(2)+"");
    }

}
}

Output For InputData1:
start="n=10.815s, & end="n=19.914s
start="n=10.815s, & end="n=20.914s


Output For InputData2:
start="n=10.815s, & end="n=20.914s

Andremoniy · Answer 3 · 2019-08-28T05:29:43.750

1

I'm joining to the previous answers. But if your file is always small, just a few strings, you may use a Regexp. In this case try this pattern: (\n|\r|.)*end\s*=\s*\"n=(.*)\"(\n|\r|.)*"

UPD: Group #2 will give you exactly you want.

edited Aug 28 '19 at 05:29

answered Dec 17 '12 at 10:51

Andremoniy

34,031
20
135
241

score 1 · Answer 4 · answered Dec 17 '12 at 11:09

it is always the best way to parse xml/html by a parser, not regex. however regarding your problem. you could try following:

String s = "foo\n <audio start=\"n=10.815s\" end=\"n=19.914s\"/>bar\n";
String re = "(?s).*?(?<=start=\"n=)([^\"]*).*";
String startTime=s.replaceAll(re, "$1");

the example above will give 10.815s to String startTime. If you want to get endTime, replace the re (start) with (end)

short explanation about the regex:

(?s) is flag dotall, which means, the regex will match new lines as well
(?<=start=\"n=)([^\"]*) this is look behind. 
                        search for text following start="n=
                        and not "(double quote) in this case is 10.815s

hope it helps

Regular expression for getting specific data

4 Answers4

Linked

Related