-3

I am stumped by this one. I am just learning regular expressions and cannot figure out why this will not return punctuation marks.

here is a piece of the text file the regex is parsing:

APRIL/NNP is/VBZ the/DT cruellest/JJ month/NN ,/, breeding/VBG Lilacs/NNP out/RB of/IN the/DT dead/JJ land/NN

text = open_file.read()

grammarList = raw_input("Enter your grammar string: ");
tags = grammarList.split("^")


tags_pattern = r'\s+'.join(r"([\w\,\:\;\"\-\.]+)/{0}".format(re.escape(tag)) for tag in tags) + r"\b"
print tags_pattern

from re import findall
start_position = 0

for poem in poemList:
    start_position = text.find('<' + poem + '>', start_position)
    end_position = text.find('</' + poem + '>', start_position)

    searchtext = text [start_position:end_position]
    poemname = poem
    for oldname, newname in poemtitleswapList.items():
        poemname = poemname.replace(oldname, newname)

    print (poemname)
    print (findall(tags_pattern, searchtext))
    print ("\n")  

I thought that in the square brackets the "\," would allow it to return a "," but it is not working.

Any help would be appreciated.

English Grad
  • 1,365
  • 5
  • 21
  • 40
  • 3
    Could you produce a complete, runnable example that would demonstrate the problem you're having? – NPE Feb 08 '12 at 14:51
  • No, this is still not a complete runnable example. Also, you seem to parse XML/HTMl with regexes, this is very wrong. – wRAR Feb 08 '12 at 14:54
  • I am not sure what I need to put up. This program references a 40mb text file and the arrays like poemList are huge. You want the whole program? – English Grad Feb 08 '12 at 14:56
  • If you cannot ask a question properly and cannot even give us an example that we can test you will not get an answer. – wRAR Feb 08 '12 at 14:57
  • I am not parsing XML with regex. I put a tag in the text file that is not xml so I can return the name of the poem in the giant file that the matches are from. That part works fine. I cannot get any matches with punctuation. – English Grad Feb 08 '12 at 14:57
  • If you can provide a string and a regex that doesn't match it, do it. If you cannot, we cannot help you. – wRAR Feb 08 '12 at 14:59
  • The input I am trying to find is NNP^VBZ^DT^JJ^NN^, which in the example at the top should return: "APRIL is the cruellest month ," but i do not get any matches. The regex I am using is in the code. – English Grad Feb 08 '12 at 15:01
  • No, your regex doesn't exist in the code. It is created at run time from some data unavailable to us. – wRAR Feb 08 '12 at 15:05
  • ([\w\,\:\;\'\-\.]+)/NNP\s+([\w\,\:\;\'\-\.]+)/VBZ\s+([\w\,\:\;\'\-\.]+)/DT\s+([\w\,\:\;\'\-\.]+)/JJ\s+([\w\,\:\;\'\-\.]+)/NN\s+([\w\,\:\;\'\-\.]+)/\,\b – English Grad Feb 08 '12 at 15:10
  • 3
    And, I ask questions to try and learn. I don't understand why your responses are so aggressive. Ok, you win, I don't know how to ask you questions properly. The polite response would be to tell me how to ask a question properly. Instead its five snooty comments on how you can't help me with what I have given you. Well then don't help. Saying things like "this is very wrong", when I am not even dealing with XML is just pompous without even offering a suggestion. How is that helpful? Everyone on here has always be very nice until now. What is your deal? – English Grad Feb 08 '12 at 15:15
  • You had your directions in the first comment. Next comments were needed because you didn't provide what you were asked in the first comment. – wRAR Feb 08 '12 at 15:20
  • Btw, in http://stackoverflow.com/questions/9143406 you have accepted answer telling you to parse NLTK data with NLTK library, yet you ask how to parse it manually. – wRAR Feb 08 '12 at 15:36

1 Answers1

0

After minimizing your example we have:

re.findall(r"/\,\b", "/NN ,/, breeding/VBG Lilacs/NNP out/RB of/IN the/DT dead/JJ land/NN")

And it does not match for obvious reasons: there is no beginning of the word immediately after the comma.

wRAR
  • 25,009
  • 4
  • 84
  • 97
  • So how do I amend my code?I don't understand what you have written here. – English Grad Feb 08 '12 at 15:30
  • I don't know what does your code do so I cannot know how it should be done, but if you add a space before your `\b`, the string will match. And if you don't understand what I wrote, you don't understand at least what does `\b` mean so you shouldn't use it in your code. – wRAR Feb 08 '12 at 15:32