remove/ replace unprintable characters from txt file using shell script

Question

I am trying to remove a newline characters from with in quotes in file

I am able to achieve that using the below code

awk -F"\"" '!length($NF){print;next}{printf("%s ", $0)}' filename.txt>filenamenew.txt

Note I am creating a new file filenamenew.txt is this avoidable can i do the command in place the reason I ask is because files are huge.

my file is pipe delimited

sample input file
"id"|"name"
"1"|"john
doe"
"2"|"second
name
in the list"

using the above code I get the following output

"id"|"name"
 "1"|"john doe"
 "2"|"second name  in the list"

but I have a huge files and i see in some of the lines have ^M character in between quotes example

second sample input file
    "id"|"name"
    "1"|"john
    doe"
    "^M2"|"second^M^M
    name
    in the list"

o/p using above code

"id"|"name"
 "1"|"john doe"
 name in the list"

so basically if there is a ^M in the line that string is not being printed but i read online ^M is equal to \r so i used tr -d'\r'< filename.txt I also tried

awk-F"|"{sub(/^M/,"")}1

but it did not remove those characters (^M)

A little background on why i am doing this I am extracting data from a relational table and loading into flat file and checking if the counts between table and file matched but since there is \n in columns count(*) vs wc-l in file is not matching.

final resolution:

i don't want to delete these unprintable characters in the long run but want to replace it with some character or value(so that counts between table and file matches) and then when i am loading it back to a table i want to again replace the value that i have added effectively as a place holder with \n or ^M what was originally present so that there is no tampering of data from my side.

Any suggestions is appreciated.

thanks.

Is counting the delimeters an alternative? Inside double quotes, and that inside single quotes. `grep -c '"|"' filename.txt`. (When you have more fields in the actual file, divide by `numberoffields-1`). — Walter A, Dec 01 '18 at 23:16
Yes this is an cool way i never knew we could count this way. Thanks I will have a chance to try this out on monday on actual dataset. One question I have been doing some reading and my understanding is that if i want to load this file back into a table then the \n will cause issues while loading into a table is it true and is there any work around that can be done here to avaid that. Either ways thanks so much. — kumarm, Dec 01 '18 at 23:39
My understanding is that ^M is not a new line, it's a carriage return. Most likely the files are generated, or created on a Windows platform. You may avoid the problem with a ftp transfer in text mode. Now, to modify your existing content, check the https://stackoverflow.com/questions/800030/remove-carriage-return-in-unix — Krassi Em, Dec 02 '18 at 12:42
Hi the file is generated by me using an etl tool from a table.@Walter: grep -c works on my sample input data but on the actual file i am getting wrong count. I used tr-d '\r' filename and tr -d '^M' filename both did not work in my situation. — kumarm, Dec 03 '18 at 14:18

remove/ replace unprintable characters from txt file using shell script

0 Answers0