As commented, your problem lies in sentence segmentation since your data allows any input (with/without proper punctuation). But somehow it's nice that you have capitalization. So you can try the recipe below to segment sentence by capitalization.
Disclaimer: If your sentence starts with I
, then the recipe below isn't going to help much =)
"Something gotta change It must be rearranged I'm sorry, I did not
mean to hurt my little girl It's beyond me I cannot carry the weight
of the heavy world So good night, good night, good night, good night
Good night, good night, good night, good night, good night Hope that
things work out all right So much to love, so much to learn But I
won't be there to teach you Oh, I know I can be close But I try my
best to reach you I'm so sorry I didn't not mean to hurt my little
girl It's beyond me I cannot carry the weight of the heavy world So
good night, good night, good night, good night Good night, good night,
good night, good night Good night, good night, good night good night,
good night Hope that things work out all right, yeah Thank you."
In Python, you can try this to segment the sentence:
sentence = "Something gotta change It must be rearranged I'm sorry, I did not mean to hurt my little girl It's beyond me I cannot carry the weight of the heavy world So good night, good night, good night, good night Good night, good night, good night, good night, good night Hope that things work out all right So much to love, so much to learn But I won't be there to teach you Oh, I know I can be close But I try my best to reach you I'm so sorry I didn't not mean to hurt my little girl It's beyond me I cannot carry the weight of the heavy world So good night, good night, good night, good night Good night, good night, good night, good night Good night, good night, good night good night, good night Hope that things work out all right, yeah Thank you."
temp = []; sentences = []
for i in sentence.split():
if i[0].isupper() and i != "I":
sentences.append(" ".join(temp))
temp = [i]
else:
temp.append(i)
sentences.append(" ".join(temp))
sentences.pop(0)
print sentences
Then later, follow this Stanford Parser and NLTK to parse the sentence.