0

I am trying to remove a newline characters from with in quotes in file

I am able to achieve that using the below code

awk -F"\"" '!length($NF){print;next}{printf("%s ", $0)}' filename.txt>filenamenew.txt

Note I am creating a new file filenamenew.txt is this avoidable can i do the command in place the reason I ask is because files are huge.

my file is pipe delimited

sample input file
"id"|"name"
"1"|"john
doe"
"2"|"second
name
in the list"

using the above code I get the following output

"id"|"name"
 "1"|"john doe"
 "2"|"second name  in the list" 

but I have a huge files and i see in some of the lines have ^M character in between quotes example

second sample input file
    "id"|"name"
    "1"|"john
    doe"
    "^M2"|"second^M^M
    name
    in the list"

o/p using above code

"id"|"name"
 "1"|"john doe"
 name in the list"

so basically if there is a ^M in the line that string is not being printed but i read online ^M is equal to \r so i used tr -d'\r'< filename.txt I also tried

awk-F"|"{sub(/^M/,"")}1

but it did not remove those characters (^M)

A little background on why i am doing this I am extracting data from a relational table and loading into flat file and checking if the counts between table and file matched but since there is \n in columns count(*) vs wc-l in file is not matching.

final resolution:

i don't want to delete these unprintable characters in the long run but want to replace it with some character or value(so that counts between table and file matches) and then when i am loading it back to a table i want to again replace the value that i have added effectively as a place holder with \n or ^M what was originally present so that there is no tampering of data from my side.

Any suggestions is appreciated.

thanks.

kumarm
  • 79
  • 3
  • 15
  • Is counting the delimeters an alternative? Inside double quotes, and that inside single quotes. `grep -c '"|"' filename.txt`. (When you have more fields in the actual file, divide by `numberoffields-1`). – Walter A Dec 01 '18 at 23:16
  • Yes this is an cool way i never knew we could count this way. Thanks I will have a chance to try this out on monday on actual dataset. One question I have been doing some reading and my understanding is that if i want to load this file back into a table then the \n will cause issues while loading into a table is it true and is there any work around that can be done here to avaid that. Either ways thanks so much. – kumarm Dec 01 '18 at 23:39
  • My understanding is that ^M is not a new line, it's a carriage return. Most likely the files are generated, or created on a Windows platform. You may avoid the problem with a ftp transfer in text mode. Now, to modify your existing content, check the https://stackoverflow.com/questions/800030/remove-carriage-return-in-unix – Krassi Em Dec 02 '18 at 12:42
  • Hi the file is generated by me using an etl tool from a table.@Walter: grep -c works on my sample input data but on the actual file i am getting wrong count. I used tr-d '\r' filename and tr -d '^M' filename both did not work in my situation. – kumarm Dec 03 '18 at 14:18

0 Answers0