11

How can I remove new line inside the " from a file?

For example:

"one", 
"three
four",
"seven"

So I want to remove the \n between the three and four. Should I use regular expression, or I have to read that's file per character with program?

fedorqui
  • 275,237
  • 103
  • 548
  • 598
Kenny Basuki
  • 625
  • 4
  • 11
  • 27

6 Answers6

30

To handle specifically those newlines that are in doubly-quoted strings and leave those alone that are outside them, using GNU awk (for RT):

gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file

This works by splitting the file along " characters and removing newlines in every other block. With a file containing

"one",
"three
four",
12,
"seven"

this will give the result

"one",
"threefour",
12,
"seven"

Note that it does not handle escape sequences. If strings in the input data can contain \", such as "He said: \"this is a direct quote.\"", then it will not work as desired.

Wintermute
  • 42,983
  • 5
  • 77
  • 80
  • 8
    I had to use ``gsub(/[\r\n]+/, " ")`` to make it work to my needs. – Rolf Jan 14 '16 at 11:05
  • This worked so +1. I had to space out the %s%s line `printf("%s %s", $0, RT)` since I needed spacing between words. – Rajib Dec 16 '19 at 15:59
5

You can print those lines starting with ". If they don't, accumulate its content into a variable and print it later on:

$ awk '/^"/ {if (f) print f; f=$0; next} {f=f FS $0} END {print f}' file
"one", 
"three four",
"seven"

Since we are always printing the previous block of text, note the need of END to print the last stored value after processing the full file.

fedorqui
  • 275,237
  • 103
  • 548
  • 598
  • 2
    This seems a bit brittle. If anything is supposed to appear outside double quotes (for example: `"one",\n"three\nfour",\n12,\n"seven"`), it can remove newlines that aren't inside quotes. – Wintermute Mar 19 '15 at 17:13
3

You can use sed for that:

sed -r '/^"[^"]+$/{:a;N;/",/!ba;s/\n/ /g}' text

The command searches for lines which start with a doublequote but don't contain another doublequote: /^"[^"]+$/

If such a line is found a label :a is defined to mark the start of a loop. Using the N command we append another line from input to the current buffer. If the new line again doesn't contain the closing double quote /",/! we step again to label a using ba unless we found the closing quote.

If the quote was found all newlines gettting replaces by a space s/\n/ /g and the buffer gets automatically printed by sed.

hek2mgl
  • 152,036
  • 28
  • 249
  • 266
1

A simplistic solution:

#!/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    chomp;
    if (m/^\"/) { print "\n"; }
    print;
}


__DATA__
"one", 
"three
four",
"seven"

But taking the specific case of csv style data, I'd suggest using a perl module called Text::CSV which parses CSV properly - and treats the 'element with a linefeed' part of the preceeding row.

#!/usr/bin/perl

use strict;
use warnings;

use Text::CSV;

my $csv = Text::CSV->new( { binary => 1 } );

open( my $input, "<", "input.csv" ) or die $!;

while ( my $row = $csv->getline($input) ) {
    for (@$row) {
        #remove linefeeds in each 'element'. 
        s/\n/ /g;
        #print this specific element ('naked' e.g. without quotes). 
        print;
        print ",";
    }
    print "\n";
}
close($input);
Sobrique
  • 52,974
  • 7
  • 60
  • 101
1

tested in a bash

purpose: replace newline inside double quote by \n

works for unix newline (\n), windows newline (\r\n) and mac newline (\n\r)

echo -e '"line1\nline2"'`

line1
line2

echo -e '"line1\nline2"' | gawk -v RS='"' 'NR % 2 == 0 { gsub(/\r?\n\r?/, "\n") } { printf("%s%s", $0, RT) }'

line1\nline2

0
 my $csv_in = 'Text::CSV'->new({binary => 1,
                             sep_char => ";",
                             auto_diag => 1
                             })
or die "CANNOT USE CSV: " . 'Text::CSV'->error_diag;

my $csv_out = 'Text::CSV'->new({ binary => 1,
                             eol => "\n",
                             sep_char => ";",
                             always_quote => 1,
                             auto_diag => 1
                             })
or die "CANNOT USE CSV: " . 'Text::CSV'->error_diag;

logger('LOG-3','PROCESSING FILE :'."\n".$source_feed_date_file);

try{
    # Inbound file reader with no encoding specified ==>
    open(my $CSV_FILE, '<', $source_feed_date_file) ;
    # Outbound file writer with UTF8 encoding ==>
    open(my $fh, '>:encoding(UTF-8)', $dest_feed_date_file) ;
    my $rx = 0;
    while (my $row = $csv_in->getline($CSV_FILE)) {
        s/\n|\r|\0|[^\x00-\x7F]//g for @$row;
        $csv_out->print ($fh, $row);

        if( $rx % 1000 == 0) {
            print "$rx \n";
        }
        $rx+=1;
    }
    print "Total Number Of Records processed:";
    print $rx ;
    my $e1 = time();
    printf("\n\nTime elapsed for %s : %.2f\n", $file,$e1 - $s1);
  } catch {
        my $e = shift;
        print $e;
        logger('LOG-4','PROCESSING FAILED FOR FILE :'."\n".$source_feed_date_file);
        exit 1;
    };

http://www.riveriq.com/blogs/2020/02/how-to-remove-new-lines-within-double-quotes

Ashish
  • 56
  • 3