How to identify a number in a column and add a specific number in that number

Question

I have a file CASE.dat file

    #         X           Y           Z       TARGET      MY DIST   MY DATA
 --------------------------------------------------------------------------------
    1   16.136051   19.214215   26.195842    0.935901     0.528294 10305.052469
    2   19.296614   20.459830   20.711839    4.033354     1.152114   258.468669
    3   21.757247   20.010601   21.609096    4.008830     1.117961   208.482335
    4   23.340579   20.230572   20.299311    0.962172     0.567720  1648.046276
    5   22.232850   19.276643   24.105109    4.028086     1.105535   116.818198
    6   20.177439   18.995924   25.744873    4.020979     1.119227   259.240957
    7   20.507640   18.422719   27.698151    0.973875     0.578381  4433.058006
    8   17.718280   19.441795   24.896309    4.052598     1.117063   399.224573
    9   17.274647   20.170761   22.411821    4.049756     1.067280   369.719958
   10   15.344147   20.532170   21.791338    0.942252     0.522218  2903.487129
   11   16.747362   21.490591   16.828061    4.119692     1.052854   640.628897
   12   18.942734   21.191117   18.059497    4.016967     1.013168   370.875172
   13   16.713317   22.043861   14.846116    0.952206     0.572128 15824.211118
   14   14.917097   21.194983   17.726730    0.996560     0.573948  8439.378683
   15   20.697846   21.496657   17.007974    0.931434     0.494488  4811.530560
   16   24.891192   18.784856   25.017254    4.004345     1.086042    87.628933
   17   24.849590   17.270757   26.442292    0.986123     0.548764  2084.437203
   18   26.020588   18.043376   23.429171    0.962405     0.489209  5797.201598
   19   29.699839   22.572565   28.810307    4.025628     1.079363   339.526719
   20   31.243469   22.179022   30.120360    0.974974     0.569833  5998.952157
   21   29.172195   25.093904   28.162412    3.991001     1.124966   301.999963

My aim is to do some processing on column number 5.

I extract it using below script

cat CASE.dat | awk '{print $5}' | awk NF | awk 'NR>1'

this gives me

Now I need a advice which can improve my above script. Further, I have two types of number here, one is ~4 and another is ~1. I want to add 2.0 into all the numbers which are ~4 and 1.0 in all the numbers which are ~1. Please suggest any simple answer.

Upto this result should be

Finally, I want to subtract the number which is ~6 from 6 (this number may vary in another file) and which is ~1 from 2 (this number may vary in another file). The final data should be

Your entire awk call can be simplified as `awk 'NR>1{print $5}' CASE.dat` — Daemon Painter, Jan 11 '21 at 08:34
Perfect. So now onward my expectations were: add `2` to the number which are close to 4 and then subtract from 6 and add `1` to the number which are close to `1` and subtract from 2. — astha, Jan 11 '21 at 08:39
Then your code is pretty useless, since you need to have `$6` and `$2`. Try to think about _how_ to get there, and be more specific. To me 3 is close to 4. — Daemon Painter, Jan 11 '21 at 08:42
True, that is what i am saying. I need to add `2` into all the number of $5 which are close to 4 and `1` to those which are close to `1` and then subtract those number which are close to 4 (after adding 2 it these till be ~6) from 6 and for those which are close to 1 (after adding 1, it will be ~2) from 2. — astha, Jan 11 '21 at 08:58

anubhava · Answer 1 · 2021-01-11T07:37:20.013

1

You nay use this awk:

 awk -v d='0.009' 'NR <= 2 {next} {n = int($5+d)} n == 4 {$5 += 2} n == 1 {$5 += 1} {n = int($5+d)} n==6 || n==1 {$5 = n - $5} {print $5}' case.dat

0.935901
-0.033354
-0.00883
0.962172
-0.028086
-0.020979
0.973875
-0.052598
-0.049756
0.942252
-0.119692
-0.016967
0.952206
1.99656
0.931434
-0.004345
0.986123
0.962405
-0.025628
0.974974
0.008999

A more readable format:

awk -v d='0.009' 'NR <= 2 { next }
{n = int($5+d)}
n == 4 {$5 += 2}
n == 1 {$5 += 1}
{n = int($5+d)}
n == 6 || n == 1 {
   $5 = n - $5
}
{print $5}' case.dat

edited Jan 11 '21 at 07:37

answered Jan 11 '21 at 07:05

anubhava

761,203
64
569
643

Thank you @Anubhava but you only pointed for the number 4. Still you see, the number which is having value 3.991001 is not processes while it is close to 4. – astha Jan 11 '21 at 07:13
Please clarify what is definition of closeness here? How much should be the variance? Is `0.005` good enough? – anubhava Jan 11 '21 at 07:16
I have only two types of number. One is ~4 and another is ~1. It is difficult to assign any definition here. – astha Jan 11 '21 at 07:17
There is a mistake in my post. Let me update it. – astha Jan 11 '21 at 07:24
I have updated it. There was a minor mistake of subtraction. Now you see some numbers are in negative. – astha Jan 11 '21 at 07:28
@astha: btw 5th field in your record # 14 is `0.996560`. After adding `1` it becomes `1.996560`. Then if you notice, it is actually `~2` not `~1` so that output should be updated in your question – anubhava Jan 11 '21 at 07:45

Synthase · Accepted Answer · 2021-01-11T08:35:51.750

Here you go:

import math
import numpy as np

with open("CASE.dat", "r") as msg:
    data = msg.readlines()

for i, line in enumerate(data[2:]):
   row = list(map(float, line.strip().split()))

   if round(row[4]) == 1:
       val = 1
   elif round(row[4]) == 4:
       val = 2

   row[4] = row[4] + val

   if round(row[4]) == 6:
       row[4] = 6 - row[4]
   elif round(row[4]) == 2:
       row[4] =  np.abs(row[4] - 2)

   
   data[i+2] = " ".join(map(str,row))

for row in data:
    print (row)

You get:

    #         X           Y           Z       TARGET      MY DIST   MY DATA

 --------------------------------------------------------------------------------

1.0 16.136051 19.214215 26.195842 0.06409900000000013 0.528294 10305.052469
2.0 19.296614 20.45983 20.711839 -0.033354000000000106 1.152114 258.468669
3.0 21.757247 20.010601 21.609096 -0.008829999999999671 1.117961 208.482335
4.0 23.340579 20.230572 20.299311 0.03782799999999997 0.56772 1648.046276
5.0 22.23285 19.276643 24.105109 -0.028086000000000055 1.105535 116.818198
6.0 20.177439 18.995924 25.744873 -0.020978999999999637 1.119227 259.240957
7.0 20.50764 18.422719 27.698151 0.026124999999999954 0.578381 4433.058006
8.0 17.71828 19.441795 24.896309 -0.0525979999999997 1.117063 399.224573
9.0 17.274647 20.170761 22.411821 -0.049756000000000355 1.06728 369.719958
10.0 15.344147 20.53217 21.791338 0.05774800000000013 0.522218 2903.487129
11.0 16.747362 21.490591 16.828061 -0.11969199999999969 1.052854 640.628897
12.0 18.942734 21.191117 18.059497 -0.016967000000000176 1.013168 370.875172
13.0 16.713317 22.043861 14.846116 0.047794000000000114 0.572128 15824.211118
14.0 14.917097 21.194983 17.72673 0.0034399999999998876 0.573948 8439.378683
15.0 20.697846 21.496657 17.007974 0.06856600000000013 0.494488 4811.53056
16.0 24.891192 18.784856 25.017254 -0.004344999999999821 1.086042 87.628933
17.0 24.84959 17.270757 26.442292 0.013876999999999917 0.548764 2084.437203
18.0 26.020588 18.043376 23.429171 0.037595000000000045 0.489209 5797.201598
19.0 29.699839 22.572565 28.810307 -0.025628000000000206 1.079363 339.526719
20.0 31.243469 22.179022 30.12036 0.025025999999999993 0.569833 5998.952157
21.0 29.172195 25.093904 28.162412 0.008999000000000201 1.124966 301.999963

Your results are more close to my expectations. Can you modify your script a little bit so that I do not need to add `1` to the number which is close to ~1? — astha, Jan 11 '21 at 08:11
I created a case.py script with header #!/usr/bin/python but it did not give any output. — astha, Jan 11 '21 at 08:12
I am getting the error $python case.py Traceback (most recent call last): File "case.py", line 6, in row = list(map(float, line.strip().split())) ValueError: could not convert string to float: -- — astha, Jan 11 '21 at 08:13
Which python version do you use? Don't set any header and just do python3 case.py. Let me know — Synthase, Jan 11 '21 at 08:14
ok so your header must point toward python2. Remove it, try with python3 case.py in your terminal. — Synthase, Jan 11 '21 at 08:16
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/227137/discussion-between-astha-and-synthase). — astha, Jan 11 '21 at 08:17
Here: https://stackoverflow.com/questions/65663933/how-to-modify-the-python-script-so-that-it-prints-upto-only-four-decimals/65663969#65663969 — astha, Jan 11 '21 at 10:17

How to identify a number in a column and add a specific number in that number

2 Answers2