1

I am trying to clean up a text file by removing punctuations, numbers and
etc.

I wrote this code to try removing punctuations initially :

import string
with open("uniquewords_list.txt") as f:

         L = sorted(word.strip(",") for line in f for word in line.split())
         
         out = L.translate(string.maketrans("",""), string.punctuation)

         with open('testing.txt', 'w') as filehandle:
              for listitem in out:
                  filehandle.write('%s\n' % listitem)

However I am getting an error :

out = L.translate(string.maketrans("",""), string.punctuation)
AttributeError: 'list' object has no attribute 'translate'

I looked up the error description but still not able to fix it. Suggestions ?

Also, to delete numbers and characters like
what is an efficient way to do ?

2014GAM
  • 103
  • 1
  • 9
  • 1
    It's clear on the error. I think this line of code you have here `L = sorted(word.strip(",") for line in f for word in line.split())` returns a list type and you are trying to use a `translate` attribute as what it says and it does not have that. I think you just need to check what output type you really want on this `L = sorted(word.strip(",") for line in f for word in line.split())` code. What output or output type are you really expecting for it to use a `.translate`? – Ice Bear Oct 07 '20 at 04:58
  • 1
    What do you want to leave? Just alphabets and whitespaces? – Joonyoung Park Oct 07 '20 at 05:05
  • Yes, just the words i.e. alphabets – 2014GAM Oct 07 '20 at 05:25
  • `string.maketrans()` apperars to be a Python 2 feature, are you really using Python 2? Modern projects should usually target the currently recommended and supported version of the language, which is Python 3. – tripleee Oct 07 '20 at 05:50
  • no i am using python 3.6 – 2014GAM Oct 07 '20 at 05:56

2 Answers2

1

Like the error message says, you can't call translate method on an object which is a list. The str objects which are the members of the list have this method, though.

Here is a simple idiomatic list comprehension which iterates over each member of the list and calls its translate method individually:

out = [x.translate(string.maketrans("",""), string.punctuation) for x in L]

If you are a beginner, perhaps this equivalent longhand code will be more readable:

out = []
for x in L:
    out.append(x.translate(string.maketrans("",""), string.punctuation))

Of course, only calling maketrans once would be more efficient.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Thanks for your answer. Yes, I am a beginner and its exciting to look at the different ways things can be done. – 2014GAM Oct 10 '20 at 15:24
1
import string
with open("uniquewords_list.txt") as f:

     contents = f.read()
     remove_pool = string.punctuation + '0123456789'  # + etc
     contents = ''.join(ch for ch in contents if ch not in remove_pool)

     with open('testing.txt', 'w') as filehandle:
          filehandle.write(contents + '\n')
Joonyoung Park
  • 474
  • 3
  • 6
  • 1
    Might also want to check out [here](https://stackoverflow.com/questions/1276764/stripping-everything-but-alphanumeric-chars-from-a-string-in-python). Mine's just a simple solution, not the most efficient. – Joonyoung Park Oct 07 '20 at 05:35