0

I have this .txt file where there is a special character which looks strange('uparrow', see screenshot). How do I remove this and blank lines along with unnecessary repeated header rows after few rows. My attempt -



remove_text = ['Trial Balance - Total Currency', 'Page:', 'Currency:', 'Balance Type:', 'ENTITY Range:', 'Ledger:', 'ENTITY:', '------------', '']


with open('MICnew.txt') as oldfile, open('MICnew.txt', 'w') as newfile:
    for line in oldfile:
        if not any(bad_word in line for bad_word in remove_text):
            newfile.write(line)

with open('MICnew.txt','r+') as file:
    for line in file:
        if not line.isspace():
            file.write(line)

My codes delete few unnecessary text and their lines but does not delete THE special char and blank lines

Screenshot of text file

Shri
  • 156
  • 11
  • I think the oldfile should be different from newfile in the 1st ```with``` clause, because if you open newfile as write mode, it just deletes the file's content. – khgb Nov 17 '21 at 11:43

3 Answers3

0

You can delete any non-ascii character with the following:

cleaned_string = string_to_clean.encode("ascii", "ignore").decode()
0xd34dc0de
  • 493
  • 4
  • 10
  • I tried this ```with open('MIC.txt') as oldfile, open('MICnew.txt', 'w') as newfile: for line in oldfile: line = line.encode("ascii", "ignore").decode() newfile.write(line)``` But it did not delete the arrow char. Any thing I missed? – Shri Nov 15 '21 at 02:39
0

Or, you can use regex to get rid of any unessecary characters.

import re
with open('MICnew.txt') as oldfile, open('MICnew.txt', 'w') as newfile:
    for line in oldfile:
        newfile.write(re.sub(r'[^a-zA-Z_0-9\s]','',line))
khgb
  • 174
  • 1
  • 12
0

This is what worked for me -

     open('MICnew.txt', 'w') as newfile:

    for line in oldfile:
        clean_line = re.sub(r'[^\x00-\x7f]', ' ', line.strip('\x0c'))
        if not clean_line.isspace():
            newfile.write(clean_line)

I was able to arrive at this solution by using suggestions from community members in my other post here

Shri
  • 156
  • 11