It looks like your source document may be a tab-delimited file but the formatting was changed when pasting into the SO window. If that's the case, then you should use the csv
package.
Assuming there are no special delimiter characters (such as \t
or ,
) in between your text and labels, you could simply extract the label as the last non-whitespace for the line. For example...
# suppose you read the file out as a gigantic string
text_and_labels = """
Everything from acting to cinematography was solid. 1
Definitely worth checking out. 1
I purchased this and within 2 days it was no longer working!!!!!!!!! 0
"""
data = []
lines = text_and_labels.split('\n') # split each line
for line in lines:
line = line.strip() # remove any outside whitespace
if line == '':
continue # it's a blank line
label = line[-1] # the last non-whitespace character
text = line[:-1].strip() # everything else, without the extra whitespace
data.append((text, label))
data[0]
>>> ('Definitely worth checking out.', '1')