-1

I have some data (basically bounding box annotations) in a txt files (space separated)

I would like to replace multiple occurrences of specific characters with some other characters. For example

0 0.649489 0.666668 0.0625 0.260877
1 0.89485 0.445085 0.0428084 0.084259
1 0.80625 0.508509 0.0469892 0.005556
2 0.529068 0.0906668 0.0582908 0.0954804
2 0.565625 0.0268509 0.0040625 0.0546296 

I might have to change it to something like

2 0.649489 0.666668 0.0625 0.260877
4 0.89485 0.445085 0.0428084 0.084259
4 0.80625 0.508509 0.0469892 0.005556
7 0.529068 0.0906668 0.0582908 0.0954804
7 0.565625 0.0268509 0.0040625 0.0546296  

and this should happen simultaneously for all the elements only in the first column (not one after the other replacement as that will index it incorrectly)

I'll basically have a mapping {old_class_1:new_class_1,old_class_2:new_class_2,old_class_3:new_class_3} and so on...

I looked into the post here, but it does not work for my case since the method described in those answers would change all the values to the last replacement.

I looked into this post as well, but am not sure if the answer here can be applied to my case since I'll have around 25 classes, so the indexes (the values of the first column) can range from 0-24

I know this can be probably be done in python by reading each file line by line and making the replacement, just was wondering if there was a quicker way

Any help would be appreciated. Thanks!

tripleee
  • 175,061
  • 34
  • 275
  • 318
Jitesh Malipeddi
  • 2,150
  • 3
  • 17
  • 37

1 Answers1

0

Here's a simple example of how to map the labels in the first column to different ones.

This specifies the mapping as a variable; you could equally well specify it in a file, or something else entirely. The main consideration is that you need to have unambiguous separator characters, and use a format which isn't unnecessarily hard for Awk to parse.

awk 'BEGIN { n = split("0:2 1:4 2:7", m);
    for(i=1; i<=n; ++i) { split(m[i], p); map[p[1]] = p[2] } }
$1 in map { $1 = map[$1] }1' file

The BEGIN field could be simplified, but I wanted to make it easy to update; now all you have to do is update the string which is the first argument to the first split to specify a different mapping. We spend a bunch of temporary variables on parsing out the values into an associative array map which is what the main script then uses.

The final 1 is not a typo; it is a standard Awk idiom to say "print every line unconditionally".

tripleee
  • 175,061
  • 34
  • 275
  • 318