1

I'm processing some CSV files. The last field is sometimes missing data, like so: first,last John,Smith Francis,

This is causing ingestion errors when trying to load into a database. I'd like to add a space or some other 'filler' after the last comma, before the crlf but I'm having trouble filling this with a search/replace. I've tried the following for example: $data =~ s/,\x0d\x0a/,\x20\x0d\0xa/gi but no space gets added before crlf. Of course I need to keep the end of line so that rows are properly marked.

I've tried reading in with Path::Tiny as well as generic open but having no success so far.

sqldoug
  • 429
  • 1
  • 3
  • 10

1 Answers1

1

First: use a module to work with csv files, and a good one is Text::CSV

Now, for a job like this, this one time, a simple filter is OK:

my $line = q(first,last John,Smith Francis,);

$line =~ s/.*,\K$/filled_last_field/; 

The greedy .* matches up to the last instance of the following pattern (comma here) on the line. The \K form of the positive lookbehind drops all previous matches so they are not consumed; so it only replaces the pattern after it (adds the phrase in this case, as needed).

If you'd like to first replace the file then you read it line by line and write the changed lines to a new file, which then gets moved over the old one.

open my $fh, '<', $file or die "Can't open $file: $!";
open my $fh_out, '>', $new_file or die "Can't open $new_file: $!";

while (<$fh>) {
    print $fh_out s/.*,\K$/filled_last_field/r;
}
# Move $new_file to $file

where I've used the /r modifier, which returns the modified string, just right for the print here. See this post and this post (for example) for full programs of this kind with relevant details.

Or, with Path::Tiny

path($file)->edit_lines( sub { s/.*,\K$/ADDED/ } )

Methods to edit a file directly were added in the version 0.077. Many of the module's methods have their _utf8 counterparts, and there is edit_lines_utf8 as well.


In a one-liner

perl -wpe's/.*,\K$/ADDED/' file.csv > new_file.csv

or, to change the input file in-place

perl -i -wpe's/.*,\K$/ADDED/' file.csv

or, to change it and also keep a backup

perl -i.bak -wpe's/.*,\K$/ADDED/' file.csv
zdim
  • 64,580
  • 5
  • 52
  • 81
  • Thanks -- I was not aware of the `\K` form at all. Whoever downvoted this care to explain? – sqldoug Jun 25 '18 at 22:50
  • Thanks @zdim; your reply was deleted for a time, so I thought it was deliberately erased. Glad to see it back. Also I'm working with UTF-16LE encoded data so I think Path::Tiny won't work as well here. – sqldoug Jun 25 '18 at 23:31
  • @sqldoug Don't know `-16LE` but many `Path::Tiny`'s methods have `_utf8` counterparts, `edit_lines` included. Does that help? – zdim Jun 25 '18 at 23:34
  • Understood. I have not looked into `edit_lines`; I will check that out, thanks. – sqldoug Jun 25 '18 at 23:41
  • @sqldoug Thank you for the attribution. Added a few bits – zdim Jun 25 '18 at 23:56
  • Thanks again @zdim. The \K form with the $ anchor solved it. – sqldoug Jun 26 '18 at 00:00