-3
~1
ACCOUNT1
34765367
001
5637463648374
1
32476743
85468456875
003
~1
~2
ACCOUNT2
23587458745647
1
002343
2347938457
~2

....SO ON

I want to print it into another file in below format :

ACCOUNT134765367001563746364837413247674385468456875003
ACCOUNT22358745874564710023432347938457

I have written something like this below which works till ~9 perfectly, but for ~10 it is adding the 10 record also to ~1 record, at the end of ~1 record. I think I need to update my Regex pattern...pls help

max_input=2
path1=/home
line_number_m=1
while [ ${line_number_m} -le ${max_input} ]
do
o_p=""
sed -n "/^${line_number_m},/^~{line_number_m}/p" ${path1}/temp_op.txt | sed 
"s/^${line_number_m}//" > ${path1}/tmp.txt
while read val
do
if [ -z ${val} ]
then
continue
else
o_p=`echo ${o_p}``echo ${val}`
fi
done< ${path1}/tmp.txt
echo ${o_p} >>${path1}/tmp_output.txt
line_number_m=`expr ${line_number_m} + 1`
done
rm ${path1}/tmp.txt
tail -n +2 ${path1}/tmp_output.txt > ${path1}/output.txt
rm ${path1}/tmp_output.txt
exit 0

The record inside ~1 and ~1 can be any random numbers or character or even spaces like below : ~1 001 13324324343 COMMON 6 487364754557465 --2space 5874654657 ---3 Space 48567846574 4568746574657 --5spaces--- ~1

I want my output like below : 00113324324343COMMON6487364754557465--5874654657---485678465744568746574657-----

  • 1
    Please let us know [what you have tried](http://whathaveyoutried.com/). Most of us here are happy to help you improve your craft, but are less happy acting as short order unpaid programming staff. Show us your work so far in an [MCVE](http://stackoverflow.com/help/mcve), the result you were expecting and the results you got, and we'll help you figure it out. – ghoti Sep 02 '17 at 15:25
  • I have tried with above code. There are 2 issues with it. First is it is appending the ~1st and ~10th record when I am passing 1 to 10 records, And the second one is it is not considering space bars records if any. – user8552135 Sep 02 '17 at 18:54

5 Answers5

1

Give a try to this, hope can help you as a start point:

#!/bin/bash

while IFS='' read -r line || [[ -n "$line" ]]; do
    if [[ $line == ACCOUNT* ]]
    then
        printf '\n%s' "$line"
    elif [[ $line != ~* ]]
    then
        printf '%s' "$line"
    fi
done < "$1"

Save it into a file and try:

./script.sh data.txt

Also check this answer: https://stackoverflow.com/a/2172367/1135424

# The == comparison operator behaves differently within a double-brackets
# test than within single brackets.

[[ $a == z* ]]   # True if $a starts with an "z" (wildcard matching).
[[ $a == "z*" ]] # True if $a is equal to z* (literal matching).
nbari
  • 25,603
  • 10
  • 76
  • 131
  • Note that if you run a script with the shebang you've indicated, `#!/bin/sh`, then (1) it is not guaranteed to use bash, and (2) even if it uses bash, will run in POSIX-compatibility mode, which does not include `[[`. Oh and also, your `printf` will fail if the input data contains percent characters that could be interpreted as formatting. – ghoti Sep 02 '17 at 15:49
  • No problem. BTW, the way to fix the other issue is `printf '%s' "$line"`. – ghoti Sep 02 '17 at 15:59
  • Don't forget to [quote your variables](http://mywiki.wooledge.org/Quotes#When_Should_You_Quote.3F). – ghoti Sep 02 '17 at 16:40
1

Easier for me in gawk or awk rather than sed. Awk processes records already, so it's particularly good at tasks like this. You just need to tell it how to recognize record separators, and what you want to do with the fields. In this case, on even-numbered records, we remove all whitespace, then print.

gawk -v RS='~[0-9]+' 'NR%2==0 {gsub(/[[:space:]]/,"");print}'

The gawk feature this relies on is the complex (regex) RS variable. In BSD or macOS, you might need something like the following, which empties the first field before concatenating all the fields in the record:

awk -v RS='~' 'NR%2==0 {$1="";gsub(/[[:space:]]/,"");print}'

If you really want to do this in sed, I supposed you could fudge it with something like the following:

sed -Ene $'H;${x;s/[[:space:]]//g;s/~[0-9]+A/\\\nA/g;s/~[0-9]*//g;p;}'

This puts the entire file into the hold space, does the same whitespace reduction as the awk script, then re-adds newlines in the process of clearing out your field separators.

ghoti
  • 45,319
  • 8
  • 65
  • 104
0

A pipeline:

$ sed '/^~/d' data | tr -d '\n' | sed -re 's/(.)A/\1\nA/g' -e 's/$/\n/'
ACCOUNT134765367001563746364837413247674385468456875003
ACCOUNT22358745874564710023432347938457
  • The first sed removes all lines starting with ~.
  • The tr concatenates everything into one single line of output.
  • The last sed chops the input up into separate lines again using the character A (of ACCOUNT) as the delimiter and adds a newline at the end.

The last sed requires GNU sed to be able to insert newlines with \n.

Kusalananda
  • 14,885
  • 3
  • 41
  • 52
  • For the record, if your shell supports format expansion, you can still insert newlines in sed, using something like: `-e $'s/$/\\\n/'`. – ghoti Sep 02 '17 at 16:01
0
$ sed '/^~/d' data | awk -v RS='A' -v OFS='' '$1 && $1=RS $1'
ACCOUNT134765367001563746364837413247674385468456875003
ACCOUNT22358745874564710023432347938457

This is my second solution to this problem.

It starts with sed deleting all lines that starts with ~.

awk is then reading the remaining data as records separated by the character A and concatenates the fields (with no delimiter) before it outputs them.

This does not rely on GNU utilities.

Kusalananda
  • 14,885
  • 3
  • 41
  • 52
0

This might work for you (GNU sed):

sed -rn '/^~/{:a;N;/^(~[0-9]+)\n(.*)\n\1$/!ba;s//\2/g;s/\s//g;p}' file

Gather up lines between consecutive separators i.e. lines beginning ~n where n is an integer. Remove the separators, remove white space and print.

potong
  • 55,640
  • 6
  • 51
  • 83