How to remove header, new line character \ and ""

Question

I have file with data as:

"S.ACQUIRER||'|'||SUBSTR(S.ACQ_COUNTRY,1,4)||'|'||SUBSTR(S.ACQ_CURRENCY_CODE,1,5)||'|'||S.PAN||'|'||SUBSTR(S.ACCTNUM,1,18)||'|'||SU\    BSTR(I.E_NAME,1,35)||'|'||S.LOCAL_DATE||'|'||S.LOCAL_TIME||'|'||DECODE(S.PCODE,0,'POSTRANSACTIONFROMDEFAULTACCOUNT',1000,'POS"
"9000000007|840|840|5048349120900000008|504834000000006028|Ecustomer name |03-JAN-14|115744|Cash Withdrawal from\
Savings Account |10|Approved |2000061|ATM Test Terminal Bang |123400000123456 |01001101"
"9000000007|840|840|5048349120900000008|504834000000006028|Ecustomer name |03-JAN-14|115744|Cash Withdrawal from\
Savings Account |10|10|4000061|ATM Test Terminal Bang |123450000000456 |01001101"

However, the expected output is:

9000000007|840|840|5048349120900000008|504834000000006028|Ecustomer name |03-JAN-14|115744|Cash Withdrawal from Savings Account |10|Approved |2000061|ATM Test Terminal Bang |123400000123456 |01001101
9000000007|840|840|5048349120900000008|504834000000006028|Ecustomer name |03-JAN-14|115744|Cash Withdrawal from Savings Account |10|10|4000061|ATM Test Terminal Bang |123450000000456 |01001101

The differences are:

There should no header line
There should no "" at start each line and end as well
The escaped new line character (backslash followed by newline) should not present

How to get my requirement?

might help: http://stackoverflow.com/questions/1251999/sed-how-can-i-replace-a-newline-n , http://stackoverflow.com/questions/10618798/removing-new-line-character-from-incoming-stream-using-sed — Amir Naghizadeh, Mar 13 '14 at 04:31
The data in the question had no newline after the backslash in the title line, but it is almost certain that the newline should have been present (and, further, the backslash-newline appeared in the middle of SUBSTR, so no space was wanted). Other data lines had trailing blanks after the backslash (and after the closing double quote), but it is almost certain that those should not have been there (but the backslash-newline should be replaced by a space to preserve sensible wording). Is that an accurate assessment? If not, please ensure that the data are accurately represented in the question. — Jonathan Leffler, Mar 13 '14 at 05:59
What have you tried, and where did you fail? By showing your attempts, your gaps in knowledge can be better addressed. — Ingo Karkat, Mar 13 '14 at 07:14

Jonathan Leffler · Answer 1 · 2014-03-13T06:01:17.493

2

sed -e '/\\$/N' \
    -e 's/\\\n/ /g' \
    -e 's/^"//' \
    -e 's/"$//' \
    -e '/^[^0-9]/d' \
    "$@"

This could be crushed into one unreadable line, but it is easier to explain the five operations when they're neatly separated:

If the line ends with a backslash, concatenate the next line into the buffer (pattern space) and restart.
Replace any backslash-newline with a space.
Delete double quote at the start of a line.
Delete double quote at the end of a line.
Delete any line that does not start with a digit.

Given a clean version of the input (no trailing blanks), this produces:

9000000007|840|840|5048349120900000008|504834000000006028|Ecustomer name |03-JAN-14|115744|Cash Withdrawal from Savings Account |10|Approved |2000061|ATM Test Terminal Bang |123400000123456 |01001101
9000000007|840|840|5048349120900000008|504834000000006028|Ecustomer name |03-JAN-14|115744|Cash Withdrawal from Savings Account |10|10|4000061|ATM Test Terminal Bang |123450000000456 |01001101

edited Mar 13 '14 at 06:01

answered Mar 13 '14 at 05:52

Jonathan Leffler

730,956
141
904
1,278

Why `/^[^0-9]/d` instead of just `1d`? – Peter Rincker Mar 13 '14 at 13:23
I had `1d` to start with, but when I looked at the data, I found that there was a backslash-space in between `SU` and `BSTR`, which strongly indicates that the heading line was split too. Given that reading the next line with N changes the line number but that there wasn't a guarantee that there couldn't be multiple continuations, or no continuation, it is risky to use `2d`. So, I opted to recognize that the first field in the data appears to be numeric. (See also my comments to the main question.) For this file, `1,2d` would be an option (it would be step 0, before the current step 1). – Jonathan Leffler Mar 13 '14 at 13:26

jaypal singh · Answer 2 · 2014-03-13T13:15:04.180

This should do the trick:

awk '/\\$/&&NR>2{sub(/\"/,"");printf $0;next}NR>2{sub(/\"/,"");print}' file

Output:

$ cat file
"S.ACQUIRER||'|'||SUBSTR(S.ACQ_COUNTRY,1,4)||'|'||SUBSTR(S.ACQ_CURRENCY_CODE,1,5)||'|'||S.PAN||'|'||SUBSTR(S.ACCTNUM,1,18)||'|'||SU\
BSTR(I.E_NAME,1,35)||'|'||S.LOCAL_DATE||'|'||S.LOCAL_TIME||'|'||DECODE(S.PCODE,0,'POSTRANSACTIONFROMDEFAULTACCOUNT',1000,'POS"
"9000000007|840|840|5048349120900000008|504834000000006028|Ecustomer name |03-JAN-14|115744|Cash Withdrawal from\
Savings Account |10|Approved |2000061|ATM Test Terminal Bang |123400000123456 |01001101"
"9000000007|840|840|5048349120900000008|504834000000006028|Ecustomer name |03-JAN-14|115744|Cash Withdrawal from\
Savings Account |10|10|4000061|ATM Test Terminal Bang |123450000000456 |01001101"

$ awk '/\\$/&&NR>2{sub(/\"/,"");printf $0;next}NR>2{sub(/\"/,"");print}' file
9000000007|840|840|5048349120900000008|504834000000006028|Ecustomer name |03-JAN-14|115744|Cash Withdrawal from\Savings Account |10|Approved |2000061|ATM Test Terminal Bang |123400000123456 |01001101
9000000007|840|840|5048349120900000008|504834000000006028|Ecustomer name |03-JAN-14|115744|Cash Withdrawal from\Savings Account |10|10|4000061|ATM Test Terminal Bang |123450000000456 |01001101

Awk golf: `NR<2{next}{sub(/"/,"")}/\\$/{printf $0;next}{print}` — Peter Rincker, Mar 13 '14 at 13:19

score 0 · Answer 3 · edited Mar 13 '14 at 04:39

0

open this in vim,execute this

:%s/^"//g

:%s/"$//g

:%s/\\//g

but I don't know how to recognize header yet

edited Mar 13 '14 at 04:39

jaypal singh

74,723
23
102
147

answered Mar 13 '14 at 04:28

lisency

433
1
5
9

Delete the header via `:1d`. See `:h :d` and `:h [range]` for more information – Peter Rincker Mar 13 '14 at 13:26
Look at the sample text: if a line ends with "\", then remove it and join with the next line. So `%s/\\\n/ /` instead of simply removing the backslashes. Probably do this before removing the `"` characters. – benjifisher Mar 13 '14 at 15:51

How to remove header, new line character \ and ""

3 Answers3