2

Hi I am parsing through XML files grabbing SQL text and paraments. I need to pull the strings that lie between two # signs. For example if this is my text:

CASE WHEN TRIM (NVL (a.SPLR_RMRK, ' ')) = '' OR TRIM (NVL (a.SPLR_RMRK, ' ')) IS NULL THEN '~' ELSE a.SPLR_RMRK END AS TXT_DESCR_J, 'PO' AS TXT_TYP_CD_J FROM #ps_RDW_Conn.jp_RDW_SCHEMA_NAME#.P_PO_RCPT_DTL a, (SELECT PO_RCPT_DTL_KEY, ETL_CRT_DTM FROM #ps_RDW_Conn.jp_RDW_SCHEMA_NAME#.#jp_PoRcptDtl_Src# WHERE ETL_UPDT_DTM > TO_DATE ('#jp_EtlPrcsDt#', 'YYYY-MM-DD:HH24:MI:SS'))

I want to have ps_RDW_Conn.jp_RDW_SCHEMA_NAME, ps_RDW_Conn.jp_RDW_SCHEMA_NAME jp_PoRcptDtl_Src and jp_EtlPrcsDt print out.

Some code that I have so far is

for eachLine in testFile:
    print re.findall('#(*?)#', eachLine)

This gives me the following error:

nothing to repeat.

Any help or suggestions is greatly appreciated!

Bakuriu
  • 98,325
  • 22
  • 197
  • 231
user3700602
  • 29
  • 1
  • 3

3 Answers3

0

Unlike in bash regular expressions, the * is not a wild-card character, but instead it says repeat 0 or more times the thing before me.

In your regular expression, your * had no symbol to modify and so you saw the complaint nothing to repeat.

On the other hand, if you supply a . symbol for * to modify, testing with one line as an example,

eachLine = '#ps_RDW_Conn.jp_RDW_SCHEMA_NAME#.P_PO_RCPT_DTL a, (SELECT PO_RCPT_DTL_KEY, '

re.findall('#(.*?)#', eachLine)

We get,

['ps_RDW_Conn.jp_RDW_SCHEMA_NAME']

Some more detail. I'm not sure if this is what you intended, but your *? is actually well placed. *? is interpreted as a single qualifier which says repeat 0 or more times the thing before me, but take as little as possible.

So this ends up having the similar effect of what @tobias_k suggests in the comments, in preventing multiple groups from being absorbed into one.

>>> line = 'And here is # some interesting code #, where later on there are #fruit flies# ?' 
>>> re.findall('#(.*)#', line)
[' some interesting code #, where later on there are #fruit flies']

>>> 
>>> re.findall('#(.*?)#', line)
[' some interesting code ', 'fruit flies']
>>> 

For reference, browse Repeating Things in docs.python.org

HeyWatchThis
  • 21,241
  • 6
  • 33
  • 41
  • 1
    +1 Don't know why the downvote... however, I'd suggest using `"#([^#]+)#`, so it does not accidentally select more than one group. – tobias_k Jun 17 '14 at 20:26
0

Your regex is not working as intended because you are using both * (0 or more) and ? (0 or 1) to modify the thing before it, but a) there is nothing before it, and b) you should use either * or ?, not both.

If you mean to capture ## or #anything#, then use the regex #(.*)#.

Dave Yarwood
  • 2,866
  • 1
  • 17
  • 29
-1

Try to escape ( and ). r'\(.*?\)' should work.

for eachLine in testFile: print re.findall(r'\(.*?\)', eachLine)

Christian Berendt
  • 3,416
  • 2
  • 13
  • 22