1

I'm trying to split sentences into clauses for a sentiment analysis purpose. For example, I'd like to change the sentence "I liked the plot but the acting was horrible." into "I liked the plot", "but the acting was horrible.".

I tried to look up how other people do this and found out 2 ways. The first way was to use nltk parse tree as the link below. How to split an NLP parse tree to clauses (independent and subordinate)?

Another way was to use spacy package and find root verb as the link below. https://subscription.packtpub.com/book/data/9781838987312/2/ch02lvl1sec13/splitting-sentences-into-clauses

What is the best way to split into clauses from above or is there any better way?

data_minD
  • 109
  • 7

1 Answers1

1

It depends how accurate it needs to be. You can probably get quite a good coverage just by looking at certain conjunctions, in your example but splits two clauses. Other candidates would be while, and (though you might need to check the context for this one to work), instead, because, etc. Commas or semicolons might also be useful.

If you have pos-information available, you can identify the core of each clause: a finite verb for main clauses, non-finite verbs for infinitive clauses (I agreed to answer the question) and gerund clauses (He had started reading the book). If you find two verbs, there must be a clause boundary between them. For an infinitive clause it will generally be before the to, for a gerund it could be a bit more complicated: He could see him reading a book essentially uses him as the direct object of see, but it's also the subject of reading; you could argue that reading a book is not really a separate clause, but a modifier of him; that is your choice to make.

So, you don't need a full syntactic analysis to split clauses. Using the above heuristics might even be more reliable in cases where the parse tree doesn't work fully, as they require less information about the structure. You might need a bit of trial-and-error, though, to get it set up initially. But at least you can easily understand why it splits clauses in a certain way.

Oliver Mason
  • 2,240
  • 2
  • 15
  • 23