1

I am learning the Julia language and would like to know which implementation is likely to have better performance.

In answer to question How to check if a string is numeric Julia last answer is

isintstring(str) = all(isdigit(c) for c in str)

This code works well, but it could be rewritten

isintstring = str -> mapreduce(isnumeric, &, collect(str))

Is the rewrite better or worse? or just different? The Julia Style Guide does not seem to provide guidance.

Edit: As originally expressed, this question was blocked for seeking opinion not facts. I have rephrased it in gratitude for the three outstanding and helpful answers the original question received.

Nigel Davies
  • 1,640
  • 1
  • 13
  • 26
  • *but to find resources* ... somehow renders your question off topic. – GhostCat Dec 10 '21 at 07:41
  • I know about the off topic risk, but Julia is complex and finding good resources is difficult. The documentation is thorough, but difficult to gain insight from. – Nigel Davies Dec 10 '21 at 13:21
  • Even without the request for resources and examples, it's off-topic. Questions asking for idiomatic (read: better Julia, more Pythonic, et al) are opinion-based, and thus off-topic. If there's a published style guide for a given language you'd like to adhere to (e.g. PEP8 for Python), you could ask about complying with that, but just "how do I write this like a Julia developer" would is off-topic. – TylerH Dec 10 '21 at 17:01

4 Answers4

4

If you are using collect there is probably something wrong with your code, especially if it's a reduction. So your second method needlessly allocates, and, furthermore, it does not bail out early, so it will keep going to the end of the string, even if the first character fails the test.

If you benchmark the performance, you will also find that mapreduce(isnumeric, &, collect(str)) is an order of magnitude slower, and that is without considering early bailout.

In general: Don't use collect(!), and bail out early if you can.

The idiomatic solution in this case is

all(isdigit, str)

Edit: Here are some benchmarks:

jl> using BenchmarkTools, Random

jl> str1 = randstring('0':'9', 100)
"7588022864395669501639871395935665186290847293742941917566300220720077480508740079115268449616551087"

jl> str2 = randstring('a':'z', 100)
"xnmiemobkalwiiamiiynzxxosqoavwgqbnxhzaazouzbfgfbiodsmhxonwkeyhxmysyfojpdjtepbzqngmfarhqzasppdmvatjsz"

jl> @btime mapreduce(isnumeric, &, collect($str1))
  702.797 ns (1 allocation: 496 bytes)
true

jl> @btime all(isdigit, $str1)
  82.035 ns (0 allocations: 0 bytes)
true

jl> @btime mapreduce(isnumeric, &, collect($str2))
  702.222 ns (1 allocation: 496 bytes)  # processes the whole string
false

jl> @btime all(isdigit, $str2)  
  3.500 ns (0 allocations: 0 bytes)  # bails out early
false

The rewrite is definitely worse. Slower, less elegant and more verbose.

Another edit: I only noticed now that you are using isnumeric with mapreduce, but isdigit with all. isnumeric is more general and much slower than isdigit so that also makes a big difference. If you use isdigit instead, and remove collect, the speed difference isn't so big for numeric strings, but it still does not bail out early for non-numeric strings, so the best solution is still clearly all(isdigit, str).

DNF
  • 11,584
  • 1
  • 26
  • 40
1

Part of your question is about named vs anonymous functions. In the first case, you created a function via its first method and assigned it to an implicitly const variable isintstring. The function object itself also gets the name isintstring, as you can see in its type. You can't reassign the variable isintstring:

julia> isintstring(str) = all(isdigit(c) for c in str)
isintstring (generic function with 1 method)

julia> typeof(isintstring)
typeof(isintstring)

julia> isintstring = str -> mapreduce(isnumeric, &, collect(str))
ERROR: invalid redefinition of constant isintstring

julia> isintstring = 1
ERROR: invalid redefinition of constant isintstring

Now let's restart the REPL and switch the order to start at the second case. The second case creates an anonymous function, then assigns it to a variable isintstring. The anonymous function gets a generated name that can't be a variable. You can reassign isintstring as long as you're not trying to declare it const, which includes method definitions.

julia> isintstring = str -> mapreduce(isnumeric, &, collect(str))
#5 (generic function with 1 method)

julia> typeof(isintstring)
var"#5#6"

julia> isintstring(str) = all(isdigit(c) for c in str)
ERROR: cannot define function isintstring; it already has a value

julia> isintstring = 1
1

It's far more readable to add methods to a named function, all you have to do is define another method using the const name, like isintstring(int, str) = blahblah().

It's actually possible to add methods to an anonymous function, but you have to do something like this: (::typeof(isintstring))(int, str) = blahblah(). The variable isintstring may not always exist, and the anonymous function can have other references such as func_array[3], in which case you'll have to write (::typeof(func_array[3]))(int, str) = blahblah(). I think you'll agree that a const name is far clearer.

Anonymous functions tend to be written as arguments in method calls like filter(x -> x%3==0, A) where the anonymous function only needs 1 method. In such a case, creating a const-named function would only bloat the function namespace and force a reader to jump around the code. In fact, do-blocks exist to allow people to write a multiple-line anonymous function as a first argument without bloating the method call.

BatWannaBe
  • 4,330
  • 1
  • 14
  • 23
1

Just like Gandhi said "My life is my message", Julia says "My code is my guide". Julia makes it very easy to inspect and explore standard and external library code, with @less, @edit, methods, etc. Guides for semantic style are rather hard to pin down (as opposed to those for syntactic style), and Python is rather the exception than the rule when it comes to the amount of documentation and emphasis surrounding this. However, reading through existing widely used code is a good way to get a feel for what the common style is.
Also, the Julialang Discourse is a more useful resource than search engines seem to give it credit for.

Now, for the question in your title, "using functional idioms" is a broad and vague descriptor - Julian style doesn't generally place high emphasis on avoiding mutations (except for performance reasons), for eg., and side effects aren't something rare and smelly. Higher order functions are pretty common, though explicit map/reduce are only one part of the arsenal of tools that includes broadcasting, generators, comprehensions, functions that implicitly do the mapping for you (sum(f, A::AbstractArray; dims): "Sum the results of calling function f on each element of an array over the given dimensions"), etc.

There's also another factor to consider: performance trumps (almost) anything else. As the other answers have hinted at, which style you go for can be a matter of optimizing for performance. Code that starts out reading functional can have parts of it start mutating its inputs, parts of it become for loops, etc., as and when necessary for performance. So it's not uncommon to see a mixture of these different style in the same package or even the same file.

Sundar R
  • 13,776
  • 6
  • 49
  • 76
0

they are the same. and if you just look at it...clearly the first one is cleaner even shorter

jling
  • 2,160
  • 12
  • 20