1

What my logic is:

  1. To divide (column 3 by column 2) and show the result (these calculation is read from fileB).
  2. To show the the actual values (untouched, read from FileA)
  3. If the result is zero (zero division error), write it to zero_file, if result is between 1 and 6, write it to between_file, if is more than 8, write it to great_file, if it's neither of those, write it to neither_file.

fileA:

col1 col2 col3 col4
1    1K   2K   name1
2    0    3K   name2
3    1K   20M  name3
4    2K   14K  name4

fileB:

col1 col2 col3     col4
1    1000 2000     name1
2    0    3000     name2
3    1000 20000000 name3
4    2000 14000    name4

Why two files?

These two above files are generated somewhere else and the values are not fixed (so, let us skip how them are made).

fileB is calculated from fileA and all K and Ms are converted to bytes.

Expected output

I am trying to fileB --> col3 divide by col2 (if it's zero or greater than 8 or between 1 and 4 (different conditions)), then show the values from fileA --> it is calculated from col3 by col2:

The name1's rate is 2 (it is calculated from 2K by 1K).
The name2's rate is 0 (it is calculated from 3K by 0).
The name3's rate is 20000 (it is calculated from 20M by 1K).
The name4's rate is 7 (it is calculated from 14K by 2K).

My attempts

This is my another attempt (with double quotes):

awk "
(NR == FNR) {
   rate = (!\$2 ? 0 : \$3/\$2)
   rx = (\$2)
   tx = (\$3)
   file_name = \"$PWD/file\"
   next
}
rate >= 1 && rate <= 4 {file_name=\"$PWD/file\"}
rate >= 8 {file_name=\"$PWD/file\"}
!rate {file_name=\"$PWD/file\"}
{
   first = (\$2)
   second = (\$3)
}
END {
   print \"ratio for\", \$NF, \"is\", rate, \"(result original receive:\", first \", original transfer:\", second \")\" > file_name
}" col3 col2

This is my another attempt (with single quotes and no variables):

awk '
(NR == FNR) {
   rate = (!$2 ? 0 : $3/$2)
   rx = ($2)
   tx = ($3)
   file_name = "/path/file"
   next
}
rate >= 1 && rate <= 4 {file_name="/path/file"}
rate >= 8 {file_name="/path/file"}
!rate {file_name="/path/file"}
{
   first = ($2)
   second = ($3)
}
END {
   print "ratio for", $NF, "is", rate, "(result original receive:", first ", original transfer:", second ")" > file_name
}' col3 col2

My issue with my attempt

ratio for name4 is 7 (result original receive: 2K, original transfer: 14K)

It only shows the last column, but not all columns.

This was my latest but unsuccessful attempt.

Saeed
  • 3,255
  • 4
  • 17
  • 36
  • 3
    [edit] your question to clarify what you need to do with fileA vs fileB as they appear to both contain the same values in different formats so it's not obvious why you'd need both of them, nor what you want to do with the values from each. Also, if 20M = 20000000 and 1K = 1000 then explain where 20480 comes from the output line `The name3's rate is 20480 (it is calculated from 20M by 1K).`. Finally - add your attempt to solve your problem. – Ed Morton Jun 04 '23 at 12:31
  • 1
    Related: https://stackoverflow.com/questions/76398334/how-to-read-two-files-in-awk-and-set-variable-for-both-files – James Brown Jun 04 '23 at 12:41
  • @EdMorton I edited and hope it's clear. I need both to later check the rate of each `name` and see its original contents. Regarding the M and K, thanks, I edited that. I added my latest unsuccessful attempt too. – Saeed Jun 04 '23 at 12:47
  • @JamesBrown yes, but I wasn't sure if I edit that question, the question will have chance to be seen in the latest question and at the top of page. If that's not true please help me more to avoid such behaviors in the future (I mean editing questions even completely instead of deleting old one and asking new). – Saeed Jun 04 '23 at 12:49
  • 1
    So `fileB` is an **output** file, not an input file? Please make that clear. Do not use double quotes around your awk script - that's what's causing you to escape all the `$`s and `"`s. Use single quotes as shown in the answer you referenced. – Ed Morton Jun 04 '23 at 12:50
  • @EdMorton but I have a variable for saving files which is `dir=/path/` and I save them all in `/path/`. Yes, `fileB` is an output of another file which is not mentioned and not needed to mention. – Saeed Jun 04 '23 at 12:51
  • 1
    Do not let shell variables like `$PWD` (or `dir` if that's a shell variable) expand to become part of your awk script, see [how-do-i-use-shell-variables-in-an-awk-script](https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script). Don't tell us about fileB in a comment - add all information to your question. Or if you don't need to mention it then simply don't mention it - remove all references to it. – Ed Morton Jun 04 '23 at 12:53
  • 1
    Do you already have both file `fileA` and `fileB` or want `fileB` to be generated from `fileA` using `awk` solution? – anubhava Jun 04 '23 at 12:55
  • 1
    Given your current code (`rate = (!\$3 ? 0 : \$3/\$2)`) and your code from your previous question (`$6==0?"ZeroDivision":($6/$5>1)...`) it's clear you misunderstand what "division by zero means". Given `A / B`, division by zero occurs when `B` is zero, not when `A` is zero. It's division **BY** zero, not division **OF** zero. – Ed Morton Jun 04 '23 at 12:57
  • @EdMorton I added a single quote version with no variable. I'll read the given link, thanks. There are no references to absent files. There are only two files: `fileA` and `fileB`. – Saeed Jun 04 '23 at 12:58
  • @anubhava yes I have these two files. I am trying to create a `file` from these two files with `awk`. – Saeed Jun 04 '23 at 12:59
  • 1
    You do not need 2 input files. You need fileA or fileB but not both. Unless there's something you aren't telling us about how the data from those 2 files is used. At the end of your script you have `col3 col2` as the input files - I think you mean `fileA fileB` but that doesn't make sense either as you don't need 2 input files. What are `varA` and `varB` in your question title `set varA from fileA and varB from fileB`? – Ed Morton Jun 04 '23 at 13:01
  • @EdMorton sir, `fileB` is converted from `fileA`. the `fileB` is in `bytes`, and `fileA` is in `Kilobytes`, `Megabytes`, etc. So I need to divide two numbers to see the rate of `send and receive`. I need that rate (e.g. which name is 10, which is 2, and so on). But I also need the original tx/rx in MB and KB later too. I have the rate at the moment, but I'm trying to expand the script to have those too. – Saeed Jun 04 '23 at 13:06
  • 1
    Please [edit] your question to explain everything. Don't add information in comments where it can't be formatted and could be missed. It's still not clear why you need `fileB`. I understand you want to convert `fileA`s values to bytes before performing your calculation, but why create a separate file of those converted values (`fileB`) in addition to just using them internally for the calculations to produce your final output like `ratio for name4 is 7 (result original receive: 2K, original transfer: 14K)`? – Ed Morton Jun 04 '23 at 13:09
  • 1
    In your code you're setting `file_name` to the same output file name in every leg of your code and you're sometimes using `fn` and sometimes `file_name` - that can't be what you intended to do. – Ed Morton Jun 04 '23 at 13:13
  • @EdMorton I edited the question and tried to make it simple and add necessary details only. I also updated the `file_name` everywhere. – Saeed Jun 04 '23 at 13:24
  • 1
    Regarding `It only shows the last column, but not all columns.` - I bet you have DOS line endings in your input, see [why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it](https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it). – Ed Morton Jun 04 '23 at 13:30

2 Answers2

2

As discussed in comments above, this awk solution produces output using fileA only:

awk -v dir='/tmp' '
BEGIN {
   dir = (dir ? dir : ".") "/"
}
function tobytes(v) {
   return v * (v ~ /M$/ ? 1000000 : (v ~ /K$/ ? 1000 : 1))
}
NR == 1 {
   next
}
{
   denom = tobytes($2)
   numer = tobytes($3)
   r = (denom ? numer / denom : 0)
   fn = "neither_file"
}
r >= 8 {
   fn ="great_file"
}
r >= 1 && r<= 6 {
   fn = "between_file"
}
!r {
   fn = "zero_file"
}
{
   print "The", $NF "\047s rate is", r > (dir fn)
}' fileA

Output:

awk 'FNR == 1 {print "::", FILENAME, "::"} 1' /tmp/*_file

:: between_file ::
The name1's rate is 2
:: great_file ::
The name3's rate is 20000
:: neither_file ::
The name4's rate is 7
:: zero_file ::
The name2's rate is 0

Change the print to:

printf "The %s\047s rate is %s (it is calculated from %s by %s).\n", $NF, r, $3, $2

to see the $2 and $3 values in the output.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Thanks a lot, but what I attempted gives me what I want, but its issue is writes only the last column. – Saeed Jun 04 '23 at 13:14
  • 1
    Unfortunately the code shown and expected output are not matching. – anubhava Jun 04 '23 at 13:19
  • Then how can I match them? My code only shows the last line. Also regarding the MB to B, the command `numfmt` does that for me and converts, so there's no need to manually define `1000=1K` or `1024=1K`. – Saeed Jun 04 '23 at 13:22
  • 2
    @Saeed there's no need to call `numfmt` (presumably within a slow, fragile shell loop) when you can just do the conversion within your awk script as shown. – Ed Morton Jun 04 '23 at 13:24
  • @EdMorton that makes sense to avoid two files (one with MB and other with B). I'll try this code soon. – Saeed Jun 04 '23 at 13:26
  • Thanks, I tried that, but how to show the original 20M for example too? Now I see only the bytes. – Saeed Jun 04 '23 at 13:34
  • 1
    @Saeed try the updated version now. – Ed Morton Jun 04 '23 at 13:36
  • Thanks a lot. I just edited your answer for myself: `r = (numer ? denom / numer : 0)` because `name` will have `2e-20` number or like this. – Saeed Jun 04 '23 at 13:42
  • What is `name` and how is it's value related to the statement `r = (numer ? denom / numer : 0)`? – Ed Morton Jun 04 '23 at 13:43
  • Sorry, typo error:( I meant `r`. – Saeed Jun 04 '23 at 13:44
  • Also can I read a shell var from `awk`? Like a var called `dir` which is `dir=/path` as an example. – Saeed Jun 04 '23 at 13:45
  • 2
    And flipping those names makes no sense - "numer" stands for "numerator" (the part above the line in a fraction) while "denom" is "denominator" (the part below it). – Ed Morton Jun 04 '23 at 13:45
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/253945/discussion-between-saeed-and-ed-morton). – Saeed Jun 04 '23 at 13:47
  • 2
    Regarding "Can I read a shell variable..." - the answer specifically shows you how to do that and I provided a reference for it in [an earlier comment](https://stackoverflow.com/questions/76400092/awk-how-to-set-vara-from-filea-and-varb-from-fileb/76400588#comment134720185_76400092). – Ed Morton Jun 04 '23 at 13:47
  • @EdMorton thanks. I know. In my actual script I divide $3 by $2 (in fact $6 by $5). So I needed to change them. Ignore that, I think I shouldn't have told because it's not related to question. – Saeed Jun 04 '23 at 13:51
  • 2
    The fix was to change how those variables are set, not now they are used. I fixed the code in the answer. – Ed Morton Jun 04 '23 at 13:56
  • @Saeed : you wanna comvert those `KB MB GB TB….` to bytes, see my extended post below. – RARE Kpop Manifesto Jun 04 '23 at 20:35
1

@Saeed : to conveniently convert those units to bytes, you can try this ::

-- since all the units are just powers of 1,024, which is 32 ^ 2, one can craft a lookup reference string that can translate all of them to bytes, accepting

— either the 1st letter only, or the 1st 2 letters and ignoring the rest, 
- in any casing, 
- of either sign, and
- any extra spaces in between  

# gawk profile, created Sun Jun  4 16:34:59 2023

# BEGIN rule(s)

BEGIN {
 1      CONVFMT = "%.16g"
 1      FS = RS
 1      ORS = "\n\n"
 1      OFS = "\f\t"
}

# Rule(s)

 6  ($++NF = any2bytes($1)) ^ _ { # 6
 6      print
}

  # Functions, listed alphabetically

 6  function any2bytes(__,_) {
 6      return \
        int(sub(/[^.[:alnum:]+-]+/, _ = "",
              __)^_ > match(__ = toupper(__), /[E-KMP-Z](B|$)/) \
        ? +__ \
        : __ * ((_ += ++_) * _^_^_) ^ index("_KBMBGBTBPBEBZBYBRBQB",
                                     substr(__, RSTART, RLENGTH ) ) )
    }

8362322323
                 8362322323

1003569bytes
                    1003569

173.77KB
                     177940

 +23.9517 mb
                   25115177

444.19GB
               476945380802

 -12.73 T
            -13996783021588

The Bs were kept in the code above for clarity. You can streamline it further by discarding all the Bs and changing the exponent's base from 32 to 1,024,

or simply, 4 ^ 5 ::

function any2bytes(__, _) {

    return int(sub("[^[:alnum:]+--]+",_ = "", __)^_++ \
           > match(__ = toupper(__), "[E-KMP-Z](B|$)") \
        ? __ \
        : __ * ((_+= ++_)^++_)^index("KMGTPEZYRQ", 
                               substr(__,  RSTART, !!_)))
}
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11
  • First of all I appreciate the time you spent to answer me and it's your grateful, but I don't know if your code is faster or anubhava's (technically I can't tell because of lack of knowledge about your two codes), but I think anubhava's is faster because his function contains two lines and his whole code (what I'm trying to have) has less line than yours. If I'm wrong, please correct me. – Saeed Jun 05 '23 at 09:47
  • 1
    @Saeed : i didn't say mine is faster, only that it can handle all the unit conversions so you wouldn't have to manually code it for KB and MB and GB separately – RARE Kpop Manifesto Jun 06 '23 at 06:19