1

I understand that reading lines in .txt files would look something like this in Python:

with open('filename','r') as fd:
   lines = fd.readlines()

However, how do I run my code to only read the words in my .txt files that are within each balanced parenthesis?

I am not sure how to go about it, let's say my .txt file contents lines like this:

kkkkk;

select xx("xE'", PUT(xx.xxxx.),"'") jdfjhf:jhfjj from xxxx_x_xx_L ;
quit; 

(* 1.xxxxx FROM xxxx_x_Ex_x */ 
proc sql; "TRUuuuth");
hhhjhfjs as fdsjfsj:
select * from djfkjd to jfkjs
(SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
FROM &xxx..xxx_xxx_xxE
   where (xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and 
  (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.)
);

The main idea is to read only these portions of the .txt file (i.e. Those within outer and inner parentheses):

 ("xE'", PUT(xx.xxxx.),"'") jdfjhf:jhfjj from xxxx_x_xx_L ;
quit; 

(* 1.xxxxx FROM xxxx_x_Ex_x */ 
proc sql; "TRUuuuth")

(SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
FROM &xxx..xxx_xxx_xxE
   where (xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and 
  (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.)
)

Any help from you guys will be truly appreciated

  • 3
    what happens in nested cases? for example in `("xE'", PUT(xx.xxxx.),"'")` do you want to read everything within the outer parentheses, or just the inner one? Can the parentheses be multi-lined? Some more information is needed to really help here – Tomerikoo Sep 04 '19 at 14:37
  • Hi @Tomerikoo, thank you for clarifying:) I want to read everything from outer to inner parentheses. Yes, the opening and closing parentheses are on different lines in the .txt file. Hence I'm not sure how to go about it –  Sep 04 '19 at 14:41
  • Maybe I was not clear myself in my comment. Let me ask again. How do you expect your output to look like? On the given example you gave, provide also an example on how your output would look like. Will it be a list of strings? what strings will it contain? and so on. Also, generally this is not a code service site so any code you already tried should be here as well – Tomerikoo Sep 04 '19 at 14:45
  • @Tomerikoo I have updated my question accordingly. I understand what you mean, but I'm not sure how to start off other than the fact that I only know how to read .txt files line by line as stated in my question.. –  Sep 04 '19 at 14:57
  • I'd advise to go with [a proper parser](https://tomassetti.me/parsing-in-python/#parserGenerators) and express a proper grammar using it. – 9000 Sep 04 '19 at 15:01
  • Dp you want to read everything between the first `(` and the last `)`, or between each pair of _corresponding_ `(` and `)`? – tobias_k Sep 04 '19 at 15:04
  • @9000 woah...I have never seen this before, Python is really That DEEP.. thank you for sharing:) it looks rather complex to me at the moment –  Sep 04 '19 at 15:05
  • @tobias_k I want to read everything from outer to inner parentheses –  Sep 04 '19 at 15:06
  • 1
    @psyduck Usually, repeating what you said in the question to clarify what you meant is not very helpful. In your example, it seems you want everything from the first `(` to the last `)` even though those parens do _not_ correspond, and including text in between that is _not_ within a pair of `(...)`. Is that correct? – tobias_k Sep 04 '19 at 15:29
  • Generally, the parens in that text seem not to be properly balanced. Where is the opening `(` to the `)` in the `proc sql` line? – tobias_k Sep 04 '19 at 15:31
  • @tobias_k ok i see what you mean now... i edited it, it was a typo, sorry! Instead of a "/" it was supposed to be a "(". So the full line is (* 1.xxxxx FROM xxxx_x_Ex_x */ proc sql; "TRUuuuth") –  Sep 04 '19 at 15:34
  • @psyduck: It's often easier to take a solution for a general problem and apply it to a particular problem than to build a custom special solution from the ground up. The latter very often tends to become more complex! When somebody else has already solved the general case, it's easy to use if [for a particular case](https://parsy.readthedocs.io/en/latest/tutorial.html), you can quite literally express your solution in the terms of your problem. – 9000 Sep 04 '19 at 16:04
  • Related: [Regular expression to return text between parenthesis](https://stackoverflow.com/questions/4894069/regular-expression-to-return-text-between-parenthesis) ... [Extract occurrence of text between brackets from a text file Python](https://stackoverflow.com/questions/52447842/extract-occurrence-of-text-between-brackets-from-a-text-file-python) – wwii Sep 04 '19 at 18:15

3 Answers3

0

You can use the re library to accomplish this. You can use the findall function to get all matches, and the re.MULTILINE option to ignore line returns.

The regex should be something like re.compile(r'\(.*?\)', re.MULTILINE)

See here for some examples.

Ryan Fleck
  • 158
  • 7
  • Thank you Ryan, I will give this a shot when I get back on my desktop and update here accordingly! –  Sep 04 '19 at 15:07
  • thanks Ryan, this worked similarly to the answer above! –  Sep 11 '19 at 03:12
  • 1
    Np! Not quite as comprehensive, though ;) – Ryan Fleck Sep 11 '19 at 11:07
  • 1
    Oh, do note that the `?` is important to configure the **greediness** of the `.*`. Without the `?` the `.*` will simply select everything within the first and last bracket in the file. – Ryan Fleck Sep 11 '19 at 11:08
0

I suggest you first read the whole file, and then discard of the lines that don't contain what you are looking for.

Your question is too unspecific to go into much more detail here, unfortunately.

If you are looking for a pattern, for example your_line = "text (some text) more text", use regular expressions, like

 import re
 brackets_re = re.compile(r'\(.*\)')
 brackets_elements = brackets_re.findall(your_line )

If you just want to check if a line contains brackets, use

if "(" in your_line:
# do something
sumisel
  • 21
  • 5
0

Not entirely clear what you want. If you need the text between all pairs of matching top-level parentheses, a regular expression is probably not going to bring you very far. Instead, you can loop the characters, keep track of opening and closing parens, storing the text in between in a buffer and adding it to the result when back at the top-level.

text = ... # your text

result = []
parens = 0
buff = ""
for c in text:
    if c == "(":
        parens += 1
    if parens > 0:
        buff += c
    if c == ")":
        parens -= 1
    if not parens and buff:
        result.append(buff)
        buff = ""

for i, r in enumerate(result):
    print(i, r)

Result:

0 ("xE'", PUT(xx.xxxx.),"'")
1 (* 1.xxxxx FROM xxxx_x_Ex_x */ 
proc sql; "TRUuuuth")
2 (SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
FROM &xxx..xxx_xxx_xxE
   where (xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and 
  (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.)
)

If you want text between (...) on all levels of nesting, you could use this variant, using a stack of buffers:

result = []
buff_stack = []
for c in text:
    if c == "(":
        buff_stack.append("")
    if buff_stack:
        buff_stack[-1] += c
    if c == ")":
        result.append(buff_stack.pop())
        if buff_stack: # add to previous level
            buff_stack[-1] += result[-1]

Result:

0 (xx.xxxx.)
1 ("xE'", PUT(xx.xxxx.),"'")
2 (* 1.xxxxx FROM xxxx_x_Ex_x */ 
proc sql; "TRUuuuth")
3 (xx_ix as format 'xxxx-xx')
4 (xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.)
5 (xx_ix as format 'xxxx-xx')
6 (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.)
7 (SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
FROM &xxx..xxx_xxx_xxE
   where (xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and 
  (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.)
)
tobias_k
  • 81,265
  • 12
  • 120
  • 179
  • Thank you tobias_k!! This worked, I used your first set of codes to run my script –  Sep 05 '19 at 06:26
  • hello tobias_k, sorry for getting back this late! Lets say my `.txt` file has a statement like this ( Select abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj FROM &xxx..xxx_xxx_xxE where (xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.) ). This means that the select code runs on a separate line and there is a tab spacing in between the bracket and Select. Now the code is only able to capture ( ), which is spacing only, rather than the select statement. Is there any way to solve this? –  Nov 18 '19 at 02:22
  • @psyduck You should ask this as a new question, maybe adding a link back to this question for reference. – tobias_k Nov 18 '19 at 08:52
  • i actually started a new question but am not getting much success on it...so im still trying through regex or editing your code but its not working. I keep getting "list index out of range" when i try your method. I think due to inconsistency in the parentheses across multiple `.txt` files. Here is the link to the question: https://stackoverflow.com/questions/58908686/extract-strings-inside-inconsistent-nested-brackets –  Nov 18 '19 at 09:00
  • @psyduck I'll have a look at it, but busy right now. On first glance, the index `[-1 or -2]` is not going to work, `-1 or -2` is just `-1`. – tobias_k Nov 18 '19 at 09:05
  • thanks tobias_k....yeap...i was just trying cause it kept saying list index out of range.. –  Nov 18 '19 at 09:07