0

I am a newbie to NLP, I have a text with labels 0 and 1.

How do I separate the labels and create a new column? Please help me.

Here is my text with labels:

Everything from acting to cinematography was solid.     1

Definitely worth checking out.      1            
I purchased this and within 2 days it was no longer working!!!!!!!!!    0
StardustGogeta
  • 3,331
  • 2
  • 18
  • 32
marton mar suri
  • 109
  • 2
  • 14

2 Answers2

0

It looks like your source document may be a tab-delimited file but the formatting was changed when pasting into the SO window. If that's the case, then you should use the csv package.

Assuming there are no special delimiter characters (such as \t or ,) in between your text and labels, you could simply extract the label as the last non-whitespace for the line. For example...

# suppose you read the file out as a gigantic string
text_and_labels = """
Everything from acting to cinematography was solid.     1

Definitely worth checking out.      1
I purchased this and within 2 days it was no longer working!!!!!!!!!    0
"""

data = []
lines = text_and_labels.split('\n')  # split each line
for line in lines:
    line = line.strip()  # remove any outside whitespace
    if line == '':
        continue  # it's a blank line
    label = line[-1]  # the last non-whitespace character
    text = line[:-1].strip()  # everything else, without the extra whitespace
    data.append((text, label))
data[0]
>>> ('Definitely worth checking out.', '1')
Andrew F
  • 2,690
  • 1
  • 14
  • 25
0

If file have proper formatted text than you do it with with simple file handling and proper indexing. else for bad formatted text you can go for regex.

file = open('filename','r+')
list1 = []
for line in file.readlines():
    try:
        list1.append(line[-2])
    except:
        pass

Now you can use this list for creating columns

GIRISH kuniyal
  • 740
  • 1
  • 5
  • 14