2

I know that this type of question has been asked several times (For example here or here), but it appears that my problem is different since I can't find a solution.

Suppose I'm given this String:

key=false hotel = trivago foo cool='tr ue' feels="good"

(Be wary that each whitespace is put there on purpose)

I'm meant to extract each pair of values, so e.g. key=false is one of them. However, if a word has no "=" after some optional whitespaces, I'm meant to return word = null. Otherwise, this is the relevant part, if the word is between either the symbols ' or ", I should save whatever is between those symbols. An example to explain what I mean: The above example is meant to return this map:

{key=false, hotel=trivago, foo=null, cool=tr ue, feels=good}

I've tried all sorts of patterns for my problem above. The closest I feel like I've got in terms of what I want is this: ([a-zA-Z0-9]+)\s*[= ]+(?![^\"']*[\"'])

The idea is: Look for a word with numbers in it ([a-zA-Z0-9]+), followed by an optional amount of whitespaces. Then look for a "=". This part isn't the real issue (I think at least...) The issue is my desired group(2): To consider examples like "stuff" or 'wo wsers', I looked up the links above and considered using a negative lookahead.

And I think that is exactly what I need: group(2) should contain whatever follows after the = symbol. If there is a ", add whatever is in the String until we reach the next "; same deal for '. However, if there is none of the mentioned symbols, stop at the next whitespace.

I've been trying for hours but I don't know any further. Can anyone help me? If you have any more questions, feel free to ask!

Edit: Since I was asked to provide a few more examples, here goes:

example with a lot of words. Makes=sense.

Should return

{example=null, with=null, a=null, lot=null, of=null, words.=null, Makes=sense.}

Another example:

Did=you know that=I like=           "cod ing"?

Should return

{Did=you, know=null, that=I, like=cod ing?}
  • What are the rules for escaping `'`, `"` and `=`? Personally I would probably write a simple state machine rather than use a regex. – tgdavies Mar 25 '23 at 04:37
  • Just a point here ! Do you control the send form charsetaccept, operate the page in a particular charset and the same operational configurations for charset and language locale in the server and server program configurations? Read this about charsets. https://drive.google.com/file/d/1gjHmdC-BW0Q2vXiQYmp1rzPU497sybNy/view?usp=drivesdk – Samuel Marchant Mar 25 '23 at 04:39
  • @tfdavies I don't fully understand what you mean with "rules", but anything goes as long as it is between either `'` or `"`. For example, key="siodfjsdoifjsdoifsjdofisdjfoisdjf" should return `{key=siodfjsdoifjsdoifsjdofisdjfoisdjf}`. However, key = "example"notreally" should result in `{key=example, notreally"=null}` – Allmighty Fishmonger Mar 25 '23 at 04:43
  • Please show a variety of input to demonstrate different cases and show exactly what you want matched from them. – Bohemian Mar 25 '23 at 04:44
  • However, I will take a look at simple state machines. I never heard of those until now – Allmighty Fishmonger Mar 25 '23 at 04:44
  • @Bohemian will do, give me a minute – Allmighty Fishmonger Mar 25 '23 at 04:45
  • Wouldn't you be better to start off by doing `s = s.replaceAll("\\s*=\\s*", "=");`? – g00se Mar 25 '23 at 10:50

1 Answers1

2

We could use the following regex match all approach:

String input = "key=false hotel    =       trivago foo       cool='tr  ue' feels=\"good\"";
Map<String, String> map = new HashMap<>();
String regex = "(\\S+)\\s*=\\s*(?:'(.*?)'|\"(.*?)\"|(\\S+))";
Pattern r = Pattern.compile(regex);
Matcher m = r.matcher(input);
while (m.find()) {
    String value = m.group(2);
    if (value == null) {
        value = m.group(3) == null ? m.group(4) : m.group(3);
    }
    map.put(m.group(1), value);
}

System.out.println(map);

// {feels=good, cool=tr  ue, hotel=trivago, key=false}

The regex pattern used here eagerly tries to find a value in single or double quotes. That failing, it defines a value as any continuous group of non whitespace characters (\S+).

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Just so that I fully understand each step of the regex: We first look for anything that isn't a whitespace. This needs to occur at least once (\\S+). Then * amount of whitespaces \\s*. Then for = and then for * amount of whitespaces The next part I sort of struggle with:We need to find two matches (I assume that's what the 2 (()) mean) that are inclosed. This includes the symbols between either two ', two " or none of the symbols at all. Did I get this correctly? Sorry if I talk weirdly, I'm very tired – Allmighty Fishmonger Mar 25 '23 at 04:53
  • I've decided to use this after understanding the regex, however this didn't fully solve my issue. But I'll still give you the mark. What I've done to get the foo part was delete each found group from the string and do a seperate pattern that looks for words and then add those into the map.put(m2.group(1), "null") – Allmighty Fishmonger Mar 25 '23 at 13:57