1

First of all my apologies if something similar was posted. My regex knowledge is very limited and I was unable to find something that I could adapt.

Giving an XML file that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog>

    <include file="init.changelog.xml"/>
    <include file="v9.1.changelog.xml"/>
    <include file="v9.2.changelog.xml"/>
    <include file="v9.3.changelog.xml"/>
    <include file="v9.3.1.changelog.xml"/>
    <include file="v9.3.3.changelog.xml"/>

</databaseChangeLog>

I would like to have a regex that would extract the last version of the change log file. In the example above that would be the string v9.3.3

That regex would need to be java compatible as I need to use it with ant.

Thank you in advance. If you able to help me a few explanations about how it works would be much appreciated.

Julian
  • 3,678
  • 7
  • 40
  • 72
  • 3
    Regex is a very, very poor choice for parsing XML. – Sergey Kalinichenko Jul 13 '13 at 02:20
  • Try using an [XML parser](http://stackoverflow.com/questions/373833/best-xml-parser-for-java) – BLaZuRE Jul 13 '13 at 02:22
  • The fact that this is XML is not very relevant here. All I need is to have a quick way to get the last value from between I am aware about other ways to parse XML but it does not worth the effort to bring another dependency just for this. – Julian Jul 13 '13 at 02:24
  • I'm also a liquibase user and it never occured to me to use the changeset file as a release indicator... A much simpler solution would be to set the version as an ANT proerty and fail the build it the matching changeset file does not exist.... I normally explicitly set my project version number from Jenkins you see. – Mark O'Connor Jul 13 '13 at 08:26
  • Setting the version no from jenkins is not a problem and we do it like that to specify the version we are building. The v9.3.3 value is useful when starting the build and restoring the database at the start of the build from a dump file. v9.3.3 will tell me in this case to restore the DB form v.9.3.3 dump even when I am building release 11 if there were no db changes between 9.3.3 and 11 – Julian Jul 13 '13 at 23:13

3 Answers3

1

You can read the file as String then use Pattern and matcher classes, here is an example

    String target = "...<include file=\"init.changelog.xml\"/><include file=\"v9.1.changelog.xml\"/><include file=\"v9.3.3.changelog.xml\"/></databaseChangeLog>...";
    Pattern pattern = Pattern.compile("(v)((\\d\\.)+)|init");
    Matcher matcher = pattern.matcher(target);
    String version = "";
    while (matcher.find())
    {
        version = matcher.group();
        System.out.println(version);
    }
    // use version

Expression (v)((\\d\\.)+|init) : means match a string consists of letter v followed by integer (\\d) followed by dot (\\.) and + means one or more

'|' is Or-ing operator so you can match "init" also

when part of the pattern included in two parentheses it means that they form one group, it is good for you to put the pattern in form of groups to make it easy when you want to get one group by itself from the matched string using the pattern matcher

"matcher" will match any part of the string that matches the pattern, matcher.group() get this part matched from the whole string, you can also use matcher.group(i) to get a group from the matched string

for example here matcher.group(2) will bring only the numbers and dots without the letter 'v' and take care that it is 1 indexed where 0 is the whole matched part from the target string, it works the same at matcher.group()

Sara Tarek
  • 369
  • 3
  • 13
  • I don't think I was clear enough. How the file is called is not restricted to vdd.d.d.changelog.xml. This just happens to be our convention but nothing would stop someone to call their file something like this: **** In this case if this is the last include entry in that file we need to capture the string **fix_jira_bug_2014** – Julian Jul 13 '13 at 03:35
  • humm, don't know if ".chagelog" won't be there all the time too, but i guessed it will always be if so what about something like this : `int end = target.lastIndexOf(".changelog"); int start = target.lastIndexOf("\"", end);` – Sara Tarek Jul 13 '13 at 04:34
  • and get the result from `target.substring(start+1,end)`, that will get the word between include and .changelog whatever it is – Sara Tarek Jul 13 '13 at 04:35
  • and here is another pattern that will work well to get any word between every ` – Sara Tarek Jul 13 '13 at 05:06
  • Thanks Sara. The java solution which I am well aware of is not something I want as that would mean to create a custom ant task which is a big NO NO when you have alternative options. – Julian Jul 13 '13 at 22:56
  • The second proposed solution was my starting point and I got it working using this: **(?m)((\s)* – Julian Jul 13 '13 at 23:03
0

Try the next:

xmlString = xmlString.replace("\r", "").replace("\n", "");
String version = xmlString.replaceAll("^.*(v\\d+(\\.\\d+)*)[^\\d]+$","$1");
Paul Vargas
  • 41,222
  • 15
  • 102
  • 148
  • Thanks but this wont select the **init** string in the case that the **** would be the last line. I'll try to adapt – Julian Jul 13 '13 at 03:18
0

Here's the one-liner:

String lastVersion = input.replaceAll("(?s).*include file=\"(.*?)\"/>[\n\\s]*</databaseChangeLog", "$1");
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • This is very similar with the one I added as part of my last comment to Sara Tarek. \n not needed as \s includes new lines as well. However this wont work if something which is non space is added between changelog.xml"/> and – Julian Jul 14 '13 at 23:04