0

- Link to the python-file | - Link to the csv testdata file

import csv 
import nltk
import re
from array import *
#Expressions
rgx_list = ['.', ',', ';', '\(', '\)', ':', '\.\.\.', '!']
#New empty array
ntitle = []
#Open a csv
with open('tripadvisor_dieburg.csv') as file:   
    reader = csv.DictReader(file)
    #Get the title and replace the expressions  
    for row in reader:
        for r in rgx_list:
            new_title = row['title']
            rgx = re.compile(r)
            new_title = re.sub(rgx, '', new_title)
        #Append to the array    
        ntitle.append(new_title)            
#Print the new title
for n in ntitle:
    print n 

I created an array named rgx_list for regular expressions and i opened a csv file with content. Then i tried to replace regular expressions in the titles row['title'] with a whitespace. After that, i want to copy the new title into a new array named "ntitle".

Only '!' will be replaced in the string, but i want that all regular expressions will be replaced.
rgx_list = ['.', ',', ';', '\(', '\)', ':', '\.\.\.', '!'] Now, what i'm doing wrong?

wpercy
  • 9,636
  • 4
  • 33
  • 45

1 Answers1

0

You've reset new_title each time over the loop.

for row in reader:
    for r in rgx_list:
        new_title = row['title']  # here - discards what you replace
        rgx = re.compile(r)
        new_title = re.sub(rgx, '', new_title)

Should instead be

for row in reader:
    new_title = row['title']  # here - only assign once
    for r in rgx_list:
        rgx = re.compile(r)
        new_title = re.sub(rgx, '', new_title)

And I think '.' should be r'\.'

You may also want to read some of the solutions at Best way to strip punctuation from a string in Python

Community
  • 1
  • 1
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245