1

I have a list of words like this:

['Urgente', 'Recibimos', 'Info']

I used the parsetree (parsetree(x, lemmata = True) function to convert the words and the output for each Word is this:

[[Sentence('urgente/JJ/B-ADJP/O/urgente')],
[Sentence('recibimos/NN/B-NP/O/recibimos')],
[Sentence('info/NN/B-NP/O/info')]]

Each component of the list has the type pattern.text.tree.Text.

I need to obtain only the group of words into the parenthesis but I don´t know how to do this, I need this output:

[urgente/JJ/B-ADJP/O/urgente,
recibimos/NN/B-NP/O/recibimos,
info/NN/B-NP/O/info]

I use str to convert to string each component to the list but this changes all output.

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135

1 Answers1

0

From their documentation, there doesn't seem to be a direct method or property to get what you want.

But I found that a Sentence object can be printed as Sentence('urgente/JJ/B-ADJP/O/urgente') using repr. So I looked at the source code for the __repr__ implementation to see how it is formed:

def __repr__(self):
    return "Sentence(%s)" % repr(" ".join(["/".join(word.tags) for word in self.words]))

It seems that the string "in parenthesis" is a combination of words and tags. You can then reuse that code, knowing that if you already have pattern.text.tree.Text objects, "a Text is a list of Sentence objects. Each Sentence is a list of Word objects." (from the Parse trees documentation).

So here's my hacky solution:

parsed = list()
for data in ['Urgente', 'Recibimos', 'Info']:
    parsed.append(parsetree(data, lemmata=True))

output = list()
for text in parsed:
    for sentence in text:
        formatted = " ".join(["/".join(word.tags) for word in sentence.words])
        output.append(str(formatted))

print(output)

Printing output gives:

['Urgente/NNP/B-NP/O/urgente', 'Recibimos/NNP/B-NP/O/recibimos', 'Info/NNP/B-NP/O/info']

Note that this solution results in a list of strs (losing all the properties/methods from the original parsetree output).

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
  • Thank you. What is the function of word.tags? – Daniel Mendoza Jun 05 '19 at 14:52
  • @DanielMendoza From the [docstring for `Word.tags`](https://github.com/clips/pattern/blob/master/pattern/text/tree.py#L204): "_Yields a list of all the token tags as they appeared when the word was parsed. For example: ["was", "VBD", "B-VP", "O", "VP-1", "A1", "be"]_". More information on parser tags can be found in the [Parser tags documentation](https://www.clips.uantwerpen.be/pages/pattern-en#parser), such as "_NN (noun), VB (verb), JJ (adjective), RB (adverb) and IN (preposition)_". – Gino Mempin Jun 05 '19 at 23:35