10

Been scouring the internet trying to figure this out. Tried isnumeric, but that only works for an AbstractChar. I'd prefer not having to use tryparse if possible, but if that's the only solution so be it... If it is, why hasn't a function for checking if a string is numeric been implemented yet?

Masterfoxify
  • 103
  • 1
  • 6
  • 5
    Why not `tryparse`? – DNF Jun 12 '19 at 09:19
  • `tryparse` is correct julia. There is no possible implementation of this that doesn't involve attempting to parse the string as a number. – Michael K. Borregaard Jun 12 '19 at 10:41
  • 4
    What do you mean by "numeric"? Is `1.0e2` numeric? Is `6.2f22` numeric? What about `0x02`? Or `0b100101`? Or `0xdead_beef.cap23`? – mbauman Jun 12 '19 at 14:47
  • @DNF Due to the fact it feels like a bit of a hack to me personally. – Masterfoxify Jun 13 '19 at 02:01
  • @MattB. By this I just mean integers, but in theory looking for a way to check any numerical value – Masterfoxify Jun 13 '19 at 02:01
  • @Masterfoxify Well, you could try a regex: `isintstring(str) = !isnothing(match(r"^\d*$", str))`, or something like that. But it's slower. My guess is that you find `tryparse` hacky because you hadn't really realized how this is done, conceptually. (**Edit:** Oops, totally missed the actual replies below.) – DNF Jun 13 '19 at 07:05

4 Answers4

10

The fastest solution I've found is using tryparse as recommended.

function check_str2(a)
    return tryparse(Float64, a) !== nothing
end

It is on average 20ns compared to 40 for the regex.

The main reason there is no way to check if a string is valid as an int without converting is that there aren't really many compelling use cases for doing so in places where performance matters. In most places, you want to know if something can be parsed as a number to use it as a number, and in the rare off case, the extra couple ns probably doesn't matter.

kmsquire
  • 1,302
  • 12
  • 12
Oscar Smith
  • 5,766
  • 1
  • 20
  • 34
  • I used `@benchmark` from benchmarktools. I'm on julia 1.1.1 – Oscar Smith Jun 12 '19 at 22:34
  • 1
    However, your solution DOES NOT WORK `check_str2("60.0xxx")` returns `true` and `60.0xxx` is not a numeric. The problem with `Parsers.tryparse` is probably that it tries to extract a number from the `String` and it is happy when it is just in the beginning. – Przemyslaw Szufel Jun 12 '19 at 22:54
  • ^ As far as my tests could tell, the above is false – Masterfoxify Jun 16 '19 at 02:55
  • On any recent Julia (i.e., after version 1.0), `check_str2("60.0xxx") returns `false`. `I'm surprised that it ever returned `true`.) – kmsquire Jan 25 '20 at 22:31
  • Actually `Parsers.tryparse` (from the `Parsers.jl` package) does have the behavior that @PrzemyslawSzufel indicated. However, `tryparse` in Julia `Base` does not have the same behavior (i.e., it works), and is actually faster than `check_str` below. I edited this solution to use the `Base` version. – kmsquire Jan 25 '20 at 22:37
  • `Base.tryparse` still benchmarks 2x slower than my answer. Hence the regular expression is the way to go here. – Przemyslaw Szufel Jan 26 '20 at 02:58
5

You normally use a regular expression to check if a string is a number:

julia> re = r"^[+-]?([0-9]+([.][0-9]*)?|[.][0-9]+)$";

julia> occursin(re,"123.")
true

julia> occursin(re,"123.0")
true

julia> occursin(re,"123.012")
true

julia> occursin(re,"123")
true 

julia> occursin(re,"ab")
false

julia> occursin(re,"ab123.1")
false

julia> occursin(re,"123.1e")
false

Note: I have used the regular expression found at Regular expression for floating point numbers If you just want to have an integer part or include the exponent, such ready regular expressions are also easy to find.

EDIT: Benchmark test.

Let us consider the following function to check whether a String is a number:

function check_str(a)
    try
        parse(Float64,a)
        true
    catch
        false
    end
end

Here are the benchmark tests. Note that the regular expression is roughly 200x faster (the increase would be smaller if we decided to look also for the exponent part) and does not allocate.

julia> using BenchmarkTools

julia> @btime check_str("60.0a")
  15.359 μs (18 allocations: 816 bytes)
false

julia> @btime occursin($re,"60.0a")
  67.023 ns (0 allocations: 0 bytes)
false

When the String is successfully parsed the speed gap is much smaller:

julia> @btime check_str("60.0")
  298.833 ns (0 allocations: 0 bytes)
true

julia> @btime occursin($re,"60.0")
  58.865 ns (0 allocations: 0 bytes)
true
Przemyslaw Szufel
  • 40,002
  • 3
  • 32
  • 62
  • Is there any advantage of this over `tryparse`? – Oscar Smith Jun 12 '19 at 21:29
  • It is roughly 200x faster, code is shorter and does not allocate. I will update my answer with an example. – Przemyslaw Szufel Jun 12 '19 at 21:35
  • If you use tryparse instead of catching an exception from parse, it is only 20ns on average (compared to 30 for regex) – Oscar Smith Jun 12 '19 at 22:05
  • Amazing work @PrzemyslawSzufel ! Though, I will keep in mind Oscar Smith's comment about time. Currently my issue does not require super hardcore optimizations, so either way it should be fine. Though this answer is absolutely amazing and has a lot of hard work put into it, Oscar Smith's answer is more functional since it also accounts for hexadecimals and scientific notation, like pointed out by Matt B. – Masterfoxify Jun 13 '19 at 02:08
  • Oscar's answer does not work (see my comment below it) - it can return a false positive for `String`s such as ``"60xxx"` or `60.0xxx`. You should either use `Base.parse` (slower then regex) or regex. – Przemyslaw Szufel Jun 13 '19 at 07:43
  • @PrzemyslawSzufel I tested what you said in my own Julia console and did not get the same result. In fact, it worked perfectly – Masterfoxify Jun 14 '19 at 01:28
2

This works for me:

isa(tryparse(Float64,"StringNumber"), Number)    # true | false
Paul Roub
  • 36,322
  • 27
  • 84
  • 93
2

As OP suggests in the comments that they only need to check integers, you can still use isnumeric (or probably better to use isdigit):

isintstring(str) = all(isdigit, str)

This seems to benchmark faster than the other answers here. On my machine, this benchmarks around 2× faster than the tryparse solution.

Jake Ireland
  • 543
  • 4
  • 11