regex replace in a file

Question

I am trying to do a replace of a specific set of characters in a file in Perl but it does not seem to work, here is my code.

my $file = shift;
open(FILE, "$file") or die "File not found";
while (<FILE>){
   $data .=$_
}
$data =~ s/[^A-CEGHJ-PR-TW-Z]{1}[A-CEGHJ-NPR-TW-Z]{1}\s?[0-9]{2}\s?[0-9]{2}\s[0-9]{2}\s?[A-DEM]{0,1}$/XX012345X/g;

I know that my pattern matching works for finding the set of characters, I am not entirely sure the replace works. However, my main concern is the Perl code. The file remains untouched after I run it.

Sample File.

AB123456C Ab12345678 DG657465 GH123456FG

1. You don't write to that file, you just read the data from it, so why should the file be changed? 2. Your regex uses anchors to match the start and the end of the string, you read multiple lines, probably you need the `m` modifier to change that behaviour? — stema, Jan 29 '13 at 10:51
Perhaps you should mention what it is that you hope your code will do. This code doesn't do anything unless you print `$data`. Also, in the first part of your regex, you have `Z{1}`, which looks like a typo. — TLP, Jan 29 '13 at 10:51
Oh, and also "it does not work" is a horribly *bad* way to describe your problem. It doesn't really say anything, does it. — TLP, Jan 29 '13 at 10:53
"The file remains untouched after I run it." Answers that. Edits made. My intentions are clear in the first line but for clarity, I am trying to open a file, do a replace regex on the entire file. Thanks — James Mclaren, Jan 29 '13 at 10:54

score 2 · Accepted Answer · answered Jan 29 '13 at 11:00

2

The code does not alter the file because you don't tell it to. You open the file for reading, not writing, plus you do not print anything.

If you want a quick way to handle this, just put your regex substitution in a file and use it as a source file. Like this:

Content of regex.pl:

s/[^A-CEGHJ-PR-TW-Z]{1}[A-CEGHJ-NPR-TW-Z]{1}\s?[0-9]{2}\s?[0-9]{2}\s[0-9]{2}\s?[A-DEM]{0,1}$/XX012345X/g;

One-liner:

perl -p regex.pl inputfile.txt > output.txt

This way you can quickly check the output. You can also pipe to a pager command or not at all.

answered Jan 29 '13 at 11:00

TLP

66,756
10
92
149

Okay, thanks for this. I like the idea of doing it one line. I have slightly modified by regex. It is now `/^[A-CEGHJ-PR-TW-Z]{1}[A-CEGHJ-NPR-TW-Z]{1}[0-9]{6}[A-DFM]{0,1}$/` which works perfect for `AB123456C` when matching in regex tester websites. When I change it to a replace by adding `/XX01234X/g` it does not seem to work. Any ideas? – James Mclaren Jan 29 '13 at 11:29
This now seems to work after reemoving the `^` and `$`. Any way I can avoid piping into another file and just modify the original? – James Mclaren Jan 29 '13 at 12:33
Yes, you can use the `-i` switch, which will edit in-place. It is recommended to use backups, e.g. `-i.bak` (backup is saved in `file.txt.bak`). So `perl -pi.bak regex.pl input.txt` – TLP Jan 29 '13 at 12:37
I usually do not recommend the `-i` switch to beginners because it is somewhat dangerous. The changes are irreversible, and even if you use backups, you can overwrite your original by running the script twice (`file.txt.bak` gets overwritten). – TLP Jan 29 '13 at 12:39

score 0 · Answer 2 · edited May 23 '17 at 12:14

The file your are opening is read only. So you need to open a temporary second file (File::Temp) where your write the $data variable, close it, remove the first file (unlink) and rename the temporary file to the desired name.

This SO question may be helpful.

Off topic note: please, use modern Perl approach to handle your files. For example:

open my $fh, "<", $filename or die "Cannot open file $filename"

See also this SO question. Avoid the use of package-global typeglob filehandles.

regex replace in a file

2 Answers2