0

I have a file with multiple columns, and some of their names are partial duplicates, following this pattern:

[]$head -n1 file.csv

AGE_030 AGE_031 AGE_032 AMET_022 ... MEND_009 MOLI_032 ... OIA_013 SH-1 SH-10 SH-107
SH-108 SH-4 SH-1_new SH-4_new AREN_1 AREN_28 AREN_3 AREN_4 AGE_032_new AMET_022_new
MEND_009_new MOLI_032_new OIA_013_new... 

All column names with _new suffix have another match without it.

I want to replace the suffixes of both mathes in this way for all partial duplicates:

  • AGE_032 to AGE_032_old
  • AGE_032_new to AGE_032

I want to do it in a way that allows me to edit the column names INSIDE the file, as perl -i does.

Any ideas?

ALG
  • 181
  • 1
  • 11

1 Answers1

2
perl -pale'
   next if $. != 1;

   my %map = map { /^(.*)_new\z/s ? ( $1 => "$1_old", $_ => $1 ) : () } @F;
   @F = map { $map{$_} // $_ } @F;
   $_ = "@F";
'

If the line is AGE_030 AGE_030_new AGE_031, @F will contain "AGE_030", "AGE_030_new", "AGE_031". From this, the following mapping is created: "AGE_030" => "AGE_030_old", "AGE_030_new" => "AGE_030". Finally, we apply the mapping and rebuild the line.

Specifying file to process to Perl one-liner

ikegami
  • 367,544
  • 15
  • 269
  • 518