0

trying to find match the regular expression for the data between XML Tag (5 digits )

The input is a String

<StudentID>12345</StudentID>

Or it can be

<ID>12345</ID>

The input can be

<Somedata>SSS<Somedata><StudentID>12345</StudentID><Name>MMM</Name>

Or the String can be

<Somedata>SSS<Somedata><ID>12345</ID><Name>MMM</Name>

I have written as (<ID>)\\d{5} and (<StudentID>)\\d{5}

any better way of doing this ?

Pavan
  • 77
  • 1
  • 9
  • Try something like `<(Student)?ID>(\\d{1,5})` This way the 2nd matc wil be the student ID... – Usagi Miyamoto Feb 19 '21 at 10:29
  • basically the question is how to match the XML Tag if it has got ID in it – Pavan Feb 19 '21 at 10:42
  • 1
    The real answer is that you should not use Regex for parsing XML, unless you have some one-off script for a quick hack. https://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg – Christoffer Hammarström Feb 19 '21 at 11:40

2 Answers2

4

As I understand, you are searching for the letters ID, followed by the character >, followed by precisely five digits and finally followed by the characters </.

You can achieve this with the following regular expression:

ID>\d{5}</

where ID> is a literal string and \d means a single digit and {5} means the preceding expression five times. Since the preceding expression is \d, then \d{5} means five digits. Finally </ is also a literal string.

Since you want to extract only the digits, you should group them by enclosing \d{5} in brackets. Hence the regular expression you require is:

ID>(\d{5})</

Here is the java code. Note that since the character \ is the "escape" character you need to write it twice in the regular expression.

public class MyClass {
    public static void main(String args[]) {
        // Tests
        System.out.println(getId("<StudentID>12345</StudentID>"));
        System.out.println(getId("<ID>12345</ID>"));
        System.out.println(getId("<Somedata>SSS<Somedata><StudentID>12345</StudentID><Name>MMM</Name>"));
        System.out.println(getId("<Somedata>SSS<Somedata><ID>12345</ID><Name>MMM</Name>"));
    }
    
    static String getId(String s) {
        java.util.regex.Pattern pattern = java.util.regex.Pattern.compile("ID>(\\d{5})</");
        java.util.regex.Matcher matcher = pattern.matcher(s);
        String id = "";
        if (matcher.find()) {
            id = matcher.group(1);
        }
        return id;
    }
}

Refer to the following:
Java tutorial on regular expressions
The Web site Regular Expressions.info

You can also experiment with regular expressions online at regex 101

Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
Abra
  • 19,142
  • 7
  • 29
  • 41
0

Use

<((?:Student)?ID)>(\d+)</\1>

See proof.

EXPLANATION

--------------------------------------------------------------------------------
  <                        '<'
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
--------------------------------------------------------------------------------
      Student                  'Student'
--------------------------------------------------------------------------------
    )?                       end of grouping
--------------------------------------------------------------------------------
    ID                       'ID'
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  >                        '>'
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  </                       '</'
--------------------------------------------------------------------------------
  \1                       what was matched by capture \1
--------------------------------------------------------------------------------
  >                        '>'

Java example code:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

final String regex = "<((?:Student)?ID)>(\\d+)</\\1>";
final String string = "<StudentID>12345</StudentID>\n"
     + "Or it can be\n\n"
     + "<ID>12345</ID>\n"
     + "The input can be\n\n"
     + "<Somedata>SSS<Somedata><StudentID>12345</StudentID><Name>MMM</Name>";

final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println("Your value is: " + matcher.group(2));
}
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37