1

I have a function that looks at every number in a file, checks if it is a perfect square, and if it is, increments a counter by 1. The goal of the function is to count the total number of perfect squares.

awk 'function root(x)  
{if (sqrt(x) == int(sqrt(x))) count+=1 } 
{print root($1)}
END{print count}' numbers_1k.list

The output from this code gives a blank line for each time it checked the condition on a line of the file. So if the file has 1000 lines, its 1000 blank lines in the output followed by the variable count

Is there anyway to avoid this? I have checked previous similar questions.

jww
  • 97,681
  • 90
  • 411
  • 885
ZakS
  • 1,073
  • 3
  • 15
  • 27
  • 2
    What is the expected output? You don't `return`anything from the function but yet you `print`. If you just want the count, remove the `print`. – James Brown Oct 16 '18 at 07:34
  • Thank you. which `print`? I only want the count variable at the end. – ZakS Oct 16 '18 at 07:40
  • 1
    Remove the `print` in `{print root($1)}`. That prints an empty line as the function `root` doesn't return anything and if it did return something, that would be printed. – James Brown Oct 16 '18 at 07:41
  • 1
    it worked, thank you @JamesBrown – ZakS Oct 16 '18 at 07:42
  • 1
    OK, I think I get this now. So the `print` causes a line to be printed and isn't needed to actually run `root`. Great explanation. – ZakS Oct 16 '18 at 07:52
  • You might be interested in this: https://stackoverflow.com/questions/295579/fastest-way-to-determine-if-an-integers-square-root-is-an-integer – kvantour Oct 16 '18 at 08:26

2 Answers2

3

The problem is that you use { print root() } where root() doesn't return anything, it should be:

awk 'function root() { return sqrt(x) == int(sqrt(x))}
     root($1) {count++}
     END {print count}' file

Btw, you don't need a function for that:

awk 'sqrt($1) == int(sqrt($1)) {count++} END {print count}' file
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • 1
    It's not out of scope. `count` needs to be defined like `function root(x, count)` for that. – James Brown Oct 16 '18 at 07:45
  • 1
    Was just about the check this. Yes, `awk 'function a() { count ++ } BEGIN{a(); print count}'` really prints `1`. I'm surprised. Thanks for the hint – hek2mgl Oct 16 '18 at 07:48
  • 1
    @hek2mgl, `sqrt(x)` should be changed to `sqrt($1)` in 2nd solution sir. – RavinderSingh13 Oct 16 '18 at 08:22
  • 1
    @hek2mgl wrt your comment about being surprised. All awk variables are global except function arguments so if you want a local variable in a function then you just add it to the function arg list after the actual function args, by convention separated from the leading real args by a tab or multiple blank chars. `awk 'function foo(realarg, imlocal){imlocal=imglobal=realarg} BEGIN{foo(7); print imlocal, imglobal}'` would output ` 7` since within `foo()` the variable `imlocal` is a local variable to `foo()` while outside of it `imlocal` is a different, global variable of the same name. – Ed Morton Oct 16 '18 at 13:03
  • @EdMorton Thank you for explaining that! Do you know the reasoning behind this behaviour? Was it the simplicity of implementation? I personally feel that this is a quite dangerous/cumbersome language behaviour. – hek2mgl Oct 16 '18 at 16:24
  • I don't know but I could guess and my guess is it provides for brief, simple code. IMHO it's far less dangerous than a typo creating a new variable (e.g. `foo=7; print fooo`), which I trip over far more often than misunderstanding the scope of a variable, and the alternative to all variables being global unless declared in a function arg list would presumably be having to declare the scope of every global variable which is more cumbersome since they're far more common than local variables. You could solve all scope/typo mistakes by declaring all variables but that is C, not awk. – Ed Morton Oct 16 '18 at 17:23
  • I guess it was easier implement then (and shows likely better performance). I just recently stumbled upon a bash application which makes highly use of _dynamic scoping_ which is more or less the same behaviour in the shell language. I found that very hard to understand and explosive to refactor. – hek2mgl Oct 16 '18 at 17:29
  • 1
    In other languages you'd want a different decision but IMHO having all variables global unless declared local was the right decision for a small, focused text-processing language like awk. I personally would have preferred an explicit way of doing that rather than adding them to the function arg list but nbd and I understand why they did that since they had to create that local scope for the "real" function args anyway so I expect the implementation was practically free. – Ed Morton Oct 16 '18 at 17:33
1

Could you please try following too.

awk 'function root(x)  
{if (sqrt(x) == int(sqrt(x)))
 {print x;count+=1
 } 
}
{root($1)}
END{print "count=",count}'  Input_file

Above code should add variable count whenever there is a TRUE condition found in function and you could increment its value inside function itself, finally you could print it in END block of awk code.

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93