How to pick useful information within curly braces from a text file using a python script?

Question

I have a huge text file that has information stored in this format.

someOtherMessage{
              class = "someClass";
      sampleMessage{
                  someValue{
                      someText{
                          someParam = "value";
                          someSymbol = "another_symbol";
                      }; //someText
                  }; //someValue
       }; //sampleMessage
    }; //someOtherMessage

someOtherMessage2{
              class = "someClass2";
      sampleMessage2{
                  someValue2{
                      someText2{
                          someParam = "value2";
                          someSymbol = "another_symbol2";
                      }; //someText2
                  }; //someValue2
       }; //sampleMessage2
    }; //someOtherMessage2

I want to iterate over this file using a py script and build a dict(or any other data struct) in the following format.

For eg.

dict = {'someOtherMessage': 'someOtherMessage{
              class = "someClass";
      sampleMessage{
                  someValue{
                      someText{
                          someParam = "value";
                          someSymbol = "another_symbol";
                      }; //someText
                  }; //someValue
       }; //sampleMessage
    }; //someOtherMessage',

'someOtherMessage2': 'someOtherMessage2{
          class = "someClass2";
  sampleMessage2{
              someValue2{
                  someText2{
                      someParam = "value2";
                      someSymbol = "another_symbol2";
                  }; //someText2
              }; //someValue2
   }; //sampleMessage2
}; //someOtherMessage2'
}

I used the following regex but it picks everything between the first and last curly brace, how can I make it pick just the required ones separately?

r"(?s){(.*)}"

Will there always be that *'}; //someOtherMessage`\n`someOtherMessage2{'* new line between any two parts that you want? — AKSingh, Jun 27 '21 at 07:28
@AKSingh, Yes, actually there can be multiple new lines too! — Gaurav Agarwal, Jun 27 '21 at 07:33
`(?s)\{(.*?)\};.*?(\n\n|$)` Try this in where `^` and `$` **do not** match end of each line. In simple words, do not add `m` modifier. — AKSingh, Jun 27 '21 at 07:35
@AKSingh, Check [this](https://regex101.com/r/eU1Afr/2) out. — Gaurav Agarwal, Jun 27 '21 at 08:13
Please remove the `m` modifier. Try https://regex101.com/r/1T7Bey/1. — AKSingh, Jun 27 '21 at 08:33
@AKSingh, Thanks, My problem is pretty much solved. Just one thing, what's the role of the last group(\n\n|$)? — Gaurav Agarwal, Jun 27 '21 at 13:24
I will write an answer to explain it. Is it working properly in the text file? — AKSingh, Jun 27 '21 at 13:50

AKSingh · Answer 1 · 2021-06-28T05:57:44.583

Here is one of the possible solution for your problem. I am assuming you are aware about greedy and lazy quantifiers. If not, here is a link: Greedy vs. Reluctant vs. Possessive Qualifiers

(?s)\{(.*?)\};.*?(\n\n|$)

Error

In the original regex, (?s){(.*)}, you have used greedy quantifier which matches too much and hence results in matching something that is not what we want.

If you replace it with a lazy quantifier, (?s){(.*?)}, it matches too little which again results in matching something not desired.

Correction

In order to specify the correction ending } we have to find something to anchor our match. This is where those new lines between that data that is to be obtained comes into play.

    }; //someOtherMessage

someOtherMessage2{

Here there is a \n after that someOtherMessage comment and then another \n in that new line. So we add

`(?s)\{(.*?)\};.*?\n\n`

which simply means {...something here...}...something here...\n\n.

This regex will still not the ending data since it does not have any \n in the end.

}; //someOtherMessage2'
}

So to match it, we set $ to match end of file by removing m modifier. This changes our regex to:

 (?s)\{(.*?)\};.*?(\n\n|$)

I hope I have helped you with your problem. Please do note this is one of the possible solution to your problem. There is another approach to finding matching nested parenthesis which is also available on SO. However, it might be a bit complex. Here is one such link: Can regular expressions be used to match nested patterns?

If you any other doubt, please do mention them.

How to pick useful information within curly braces from a text file using a python script?

1 Answers1

Error

Correction