Delete a specific character in txt

Question

Imagine that I've the next txt format:

'20201': "a" ,
'20202': "e" ,
'20203': "i" ,
'20204': "o" ,
'20205': "u" ,
'20207': "ae" ,
'20209': "ai" ,
'20210': "ao"

And I want to erase the four digit when it's a 0. So the expected output is:

'2021': "a" ,
'2022': "e" ,
'2023': "i" ,
'2024': "o" ,
'2025': "u" ,
'2027': "ae" ,
'2029': "ai" ,
'20210': "ao"

I'm thinking about this:

awk -i inplace  ' { for ( i = 1; i <= NF; ++i ) {

    if ( $i == '0')
        r = 1

    
    }
  }}
1 ' example.txt ```

Don't use `-i inplace` (or even show it in your question) while you're working on a solution. Just let the output display to stdout til you get it right. — Ed Morton, Dec 16 '20 at 14:49

RavinderSingh13 · Accepted Answer · 2020-12-16T14:11:28.560

With awk could you please try following, written and tested with shown samples in GNU awk.

Without field separator try:

awk 'substr($0,5,1)==0{ $0=substr($0,1,4) substr($0,6) } 1'  Input_file

OR with field separator try following: To deal with only 1st field here specifically.

awk '
BEGIN{
  FS=OFS=":"
}
substr($1,5,1)==0{
  $1=substr($1,1,4) substr($1,6)
}
1
'  Input_file

To save output into Input_file itself append > temp && mv temp Input_file once you are happy with above command's output.

Explanation: Adding detailed explanation for above.

awk '                             ##Starting awk program from here.
BEGIN{                            ##Starting BEGIN section of this program from here.
  FS=OFS=":"                      ##Setting FS and OFS as colon here.
}
substr($1,5,1)==0{                ##Checking condition if 5th character is 0 then do following.
  $1=substr($1,1,4) substr($1,6)  ##Setting sub string of 1st 4 characters then mentioning characters from 6th character to last of 1st field here.
}
1                                 ##1 will print current line.
' Input_file                      ##Mentioning Input_file name here.

@Max, Yeah, I have mentioned in answer. "To save output into Input_file itself append ` > temp && mv temp Input_file` once you are happy with above command's output." — RavinderSingh13, Dec 16 '20 at 11:00

costaparas · Answer 2 · 2020-12-16T11:09:30.027

4

For a terse GNU sed solution, this works:

sed "s/^\(....\)0/\1/" example.txt

Here, we just match the first 5 characters -- with the first 4 being free & the 5th being a zero. For any matches, we replace the first 5 characters with only the first 4 characters.

If you want to modify the file inplace, you can use sed's -i option:

sed "s/^\(....\)0/\1/" -i example.txt

(Note -i will work on many, but not all, systems; see workarounds here)

edited Dec 16 '20 at 11:09

answered Dec 16 '20 at 11:02

costaparas

5,047
11
16
26

6

The answer above answers the question succinctly but perhaps `sed -E 's/^(.{4})0/\1/' file` although not shorter has more mileage. – potong Dec 16 '20 at 12:54
I am not seeing what makes this SED GNU specific? – dawg Dec 16 '20 at 15:09

kvantour · Answer 3 · 2020-12-16T17:25:25.917

2

If my substring is a positive number, remove the fourth digit if it is zero:

sed -e 's/\([0-9][0-9][0-9]\)0/\1/g' file

If my word is a positive number, remove the fourth digit if it is zero:

sed -e 's/\b\([0-9][0-9][0-9]\)0\([0-9]*\)\b/\1\2/g' file

edited Dec 16 '20 at 17:25

answered Dec 16 '20 at 11:47

kvantour

25,269
4
47
72

score 1 · Answer 4 · answered Dec 16 '20 at 13:06

If you want to use python as a tagged option, then consider working with pandas.read_csv function along with str.split and str.replace methods, and then applying str.join method combining each splitted pieces for each derived lines while writing into the original file such as

import pandas as pd

sss=[]
with open('myfile.txt','r') as f_in:
        data = pd.read_csv(f_in,header=None)        
        for line in data[0]:
            s=line.split()
            j=0
            ss=""
            for i in s[0]:
                j+=1
                if j==5: # including the first quote(')
                    if i!='0':
                        ss+=i
                else:
                    ss+=i

            sss.append(line.replace(s[0],ss))                 
j=0
ss=""
with open('myfile.txt','w') as f_out:
    for line in sss:
        j+=1
        ss=''.join(str(line))
        if j==len(sss):
            f_out.write(ss+'\n')
        else:
            f_out.write(ss+',\n')

score 0 · Answer 5 · answered Dec 16 '20 at 12:42

You might harness GNU AWK gensub for that following way, let file.txt content be

'20201': "a" ,
'20202': "e" ,
'20203': "i" ,
'20204': "o" ,
'20205': "u" ,
'20207': "ae" ,
'20209': "ai" ,
'20210': "ao"

then

awk '{print gensub(/^(....)0/,"\\1",1)}' file.txt

output

'2021': "a" ,
'2022': "e" ,
'2023': "i" ,
'2024': "o" ,
'2025': "u" ,
'2027': "ae" ,
'2029': "ai" ,
'20210': "ao"

Explanation: I used gensub's ability to specify components of a regexp in the replacement text to instruct to replace (first 4 characters followed by zero) using (these first 4 characters). We need to get 4 first characters, due to leading ' which means 4th digit is 5th character.

Delete a specific character in txt

5 Answers5