-1

I have a project wherein I'm meant to match all instances of a word in a job advert, taken from the GitHub Jobs API.

I've been playing around with regexr.com, but I know that it's not specific to Java.

How can I use Java-specific RegEx to match instances of the "head" word in the following code, regardless of capitalization or complex word spacing, i.e. Cloud computing.

Python(Code.advanced_computing, "python", "(python)"),
AdvancedComputing(Code.advanced_computing, "advanced computing", "(advanced computing)"),
Programming(Code.advanced_computing, "programming", "(programming)"),
ComputationalSystems(Code.advanced_computing, "computational systems", "(computational systems)"),
Coding(Code.advanced_computing, "coding", "(coding)"),
CloudComputing(Code.advanced_computing, "Cloud computing", "(\\Cloud computing)"),

According to this answer, the following should work, however, that is not the case:

Python(Code.advanced_computing, "python", "(/python/i)"),
AdvancedComputing(Code.advanced_computing, "advanced computing", "(/advanced.*?computing/i)"),
Programming(Code.advanced_computing, "programming", "(programming)"),
ComputationalSystems(Code.advanced_computing, "computational systems", "(/computational.*?systems/i)"),
Coding(Code.advanced_computing, "coding", "(/coding/i)"),
CloudComputing(Code.advanced_computing, "Cloud computing", "(/cloud.*?computing/i)"),
Community
  • 1
  • 1
smatthewenglish
  • 2,831
  • 4
  • 36
  • 72
  • 3
    See [Learning Regular Expressions](http://stackoverflow.com/questions/4736/learning-regular-expressions) and [The Stack Overflow Regular Expressions FAQ](http://stackoverflow.com/a/22944075/1945631). If you have a specific problem then show us what you have tried, describe why it doesn't work and what you would expect it to do. See "[ask]" and [this checklist](http://meta.stackoverflow.com/questions/260648/stack-overflow-question-checklist) for some guidance. – Andy Brown Sep 01 '15 at 09:57

1 Answers1

1

In order to use case insensitive matching in Java you have to include (?i) in the start of your regex.

Consider the following naive example:

String s = "ClOuD ComPuTinG";
if(s.matches("(?i)cloud.*computing")) {
    System.out.println("MATCH"); // will print MATCH
} else {
    System.out.println("NOT");
}

if(s.matches("cloud.*computing")) {
    System.out.println("MATCH");
} else {
    System.out.println("NOT"); // will print NOT
}

For more details take a look at this article about case insensitive matching in java.

Update

Furthermore, you don't have to write the regex inside / in Java.

So the line

CloudComputing(Code.advanced_computing, "Cloud computing", "(/cloud.*?computing/i)")

should be

CloudComputing(Code.advanced_computing, "Cloud computing", "(?i)cloud.*computing")

The .* is the correct expression to match any number of characters. This means that you will also match cases like the clouds are white. I like computing.... I would use a regex like (?i)cloud[\s_-]*computing where [\s_-] is a character class of all the empty spaces, undersores and dashes. So, you will match cloud_computing or cloud-_-_---_ computing but not the previous sentence.

Master_ex
  • 789
  • 6
  • 12