0

How to find a line with particular pattern and remove new line character from it in unix . Suppose I have a comma separated file

100,"John","Clerk",,,,  
101,"Dannis","Manager",,,,  
102,"Michael","Senior  

Manager",,,,  

103,"Donald","President of 

united states",,,,  

output I want is

100,"John","Clerk",,,,  
101,"Dannis","Manager",,,,  
102,"Michael","Senior Manager",,,,  
103,"Donald","President of united states",,,,  
YOGI
  • 1
  • 1

6 Answers6

2

Short sed solution:

sed -z 's/\n*//g; s/,,,,/&\n/g' file

The output:

100,"John","Clerk",,,,
101,"Dannis","Manager",,,,
102,"Michael","Senior Manager",,,,
103,"Donald","President of united states",,,,

Or with awk:

awk 'BEGIN{ RS=ORS="" }{ gsub(/\n+/," ",$0); gsub(/,,,, */,"&\n",$0); print }' file
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • 2
    @YOGI, I'm always testing my solutions. Yes, it's working. Besides, you did not elaborate your "particular pattern" – RomanPerekhrest Aug 25 '17 at 17:43
  • @RomanPerekrest, Thanks the solution you gave was working but I have edited the post to include additional issue I have in the file. Could you please help me with edited post. I am trying to match the lines starting with characters using sed '/[A-Za-z]//p' file and then want to remove \n from this line. – YOGI Aug 25 '17 at 17:52
  • @k-five, it should not be called as "missing", cause it's trailing space. Besides, that trailing space would remain only if it appears in the initial file (after each line) – RomanPerekhrest Aug 26 '17 at 11:44
0

try following awk too once.

awk '/^$/{next} {val=$0 ~ /^[0-9]/?(val?val ORS $0:$0):(val?val OFS $0:$0)} END{print val}' Input_file

EDIT: Adding a non-one liner form of solution along with explanation of it too.

awk '
/^$/{   ## Checking here if a line starts from space, if yes then do following action.
   next ## next keyword will skip all further actions here.
}
{
val=$0 ~ /^[0-9]/?(val?val ORS $0:$0):(val?val OFS $0:$0) ##creating variable named val here which will check 2 conditions if a line starts with digit then it will concatenate itself with a new line and if a line statrs with non-digit value then it will concatenate its value with a space.
}
END{         ##END block of awk code here.
   print val ##printing the value of variable named val here
}
' Input_file ## Mentioning Input_file here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
0
awk '{printf("%s", $0)}/,,,,/{print "\n"}' ORS="" file

100,"John","Clerk",,,,  
101,"Dannis","Manager",,,,  
102,"Michael","Senior Manager",,,,  
103,"Donald","President of united states",,,,
Claes Wikner
  • 1,457
  • 1
  • 9
  • 8
0

This might work for you (GNU sed):

sed -r ':a;N;/^([^\n,]*,){6}/!s/\n//;ta;P;D' file

Append another line to the pattern space (PS) and if that line does not contain 6 ,'s, remove a newline and repeat, otherwise print and delete the first of the lines and then repeat.

potong
  • 55,640
  • 6
  • 51
  • 83
0

if you do not mind with Perl

first remove extra newline:

perl -pe 's/^\n//;' file 

the output:

100,"John","Clerk",,,,
101,"Dannis","Manager",,,,
102,"Michael","Senior
Manager",,,,
103,"Donald","President of
united states",,,,

then what you can is: adding new substitution to remove newline of last word of each line. And for that you can use:

s/(\w+)\s+\n$/$1 /;

here \w+ matches Senior and of and keep them in $1 and you can use it with /$1 / and and noticeable part is a single space: after $1

and finally we have:

perl -pe 's/^\n//;s/(\w+)\s+\n$/==>$1<== /;' file

the output:

100,"John","Clerk",,,,
101,"Dannis","Manager",,,,
102,"Michael","==>Senior<== Manager",,,,
103,"Donald","President ==>of<== united states",,,,

NOTE:

remove ==> and <== and add -i.bak for getting backup and edit-in-place

and even in a single substitution:

perl -lpe '$/=undef; s/(\w+)\s+\n\n^([^\n]+)\n/$1 $2/gm;'  file
Shakiba Moshiri
  • 21,040
  • 2
  • 34
  • 44
0

Copy the code from https://stackoverflow.com/a/45420607/1745001 and change this:

{
    printf "Record %d:\n", ++recNr
    for (i=1;i<=NF;i++) {
        printf "    $%d=<%s>\n", i, $i
    }
    print "----"
}

to this:

/your regexp/ {
    printf "Record %d:\n", ++recNr
    for (i=1;i<=NF;i++) {
        gsub(/\n/," ",$i)
        printf "    $%d=<%s>\n", i, $i
    }
    print "----"
}

where your regexp is whatever regular expression (the "particular pattern" you mentioned in your question) you're trying to find in your data.

Unlike most (all?) of your other current answers, the above does not rely on your input lines ending with ,,,,, nor does it read the whole file into memory, nor does it rely on the parts of the field following a newline starting with any particular value, nor does it rely on there only being up to 1 blank line in a field, nor does it require any particular version of a tool, etc.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185