How to repeat individual characters in strings in Julia

Question

This question shows how to repeat individual characters in strings in Python.

>>> s = '123abc'
>>> n = 3
>>> ''.join([c*n for c in s])
'111222333aaabbbccc'

How would you do that in Julia?

EDIT

As a newcomer to Julia I am amazed at what the language has to offer.

For example, I would have thought that the Python code above is about as simple as the code could get in any language. However, as shown by my answer below, the Julia equivalent code join([c^n for c in s]) is arguably simpler, and may be reaching the optimum of simplicity for any language.

On the other hand, @niczky12 has shown that with the addition of the ellipsis operator to the string function, the speed can be substantially increased over what the somewhat simpler join function achieves.

In one case Julia shines for simplicity. In the other case, Julia shines for speed.

To a Python programmer the first case should be almost immediately readable when they notice that c^n is just c*n in Python. When they see the speed increase using the ... ellipsis operator, the extra complexity might not deter them from learning Julia. Readers might be starting to think I hope many Python programmers will take Julia seriously. They would not be wrong.

Thanks to @rickhg12hs for suggesting bench-marking. I have learned a lot.

@phg Yes, I must of had Julia on my mind! I also edited out the `c^n` mistake, which is Julia, to the `c*n` which is Python. Thanks much. — Julia Learner, Sep 30 '18 at 03:09

Julia Learner · Answer 1 · 2018-09-27T04:50:42.047

You can do it with either a Julia comprehension or a generator.

julia> VERSION
v"1.0.0"

julia> s = "123abc"
"123abc"

# n is number of times to repeat each character.
julia> n = 3
3

# Using a Julia comprehension with [...]
julia> join([c^n for c in s])
"111222333aaabbbccc"

# Using a Julia generator without the [...]
julia> join(c^n for c in s)
"111222333aaabbbccc"

For small strings there should be little practical difference in speed.

Edit

TL;DR: In general, the generator is somewhat faster than the comprehension. However, see case 3 for the opposite. The memory estimates were very similar.

@rickhg12hs has suggested it would be nice to have benchmarks.

Using the great BenchmarkTools package, the results are below.

n = the number of times to repeat each character

s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" in each case

In each case, the comprehension median time, C, is listed first, vs the generator median time, G, second. The times were rounded as seemed appropriate and the original figures are below the numbered summaries. Smaller, of course, is better.

The memory estimates were not very different.

1. n = 26, C=3.8 vs. G=2.8 μs, G faster

julia> using BenchmarkTools

julia> n = 26;

julia> @benchmark join([c^n for c in s])
BenchmarkTools.Trial:
  memory estimate:  3.55 KiB
  allocs estimate:  39
  --------------
  minimum time:     3.688 μs (0.00% GC)
  median time:      3.849 μs (0.00% GC)
  mean time:        4.956 μs (16.27% GC)
  maximum time:     5.211 ms (99.85% GC)
  --------------
  samples:          10000
  evals/sample:     8

julia> @benchmark join(c^n for c in s)
BenchmarkTools.Trial:
  memory estimate:  3.19 KiB
  allocs estimate:  36
  --------------
  minimum time:     2.661 μs (0.00% GC)
  median time:      2.756 μs (0.00% GC)
  mean time:        3.622 μs (19.94% GC)
  maximum time:     4.638 ms (99.89% GC)
  --------------
  samples:          10000
  evals/sample:     9

2. n = 260, C=10.7 vs. G=8.1 μs, G faster

julia> n = 260;

julia> @benchmark join([c^n for c in s])
BenchmarkTools.Trial:
  memory estimate:  19.23 KiB
  allocs estimate:  39
  --------------
  minimum time:     8.125 μs (0.00% GC)
  median time:      10.691 μs (0.00% GC)
  mean time:        18.559 μs (35.36% GC)
  maximum time:     43.930 ms (99.92% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark join(c^n for c in s)
BenchmarkTools.Trial:
  memory estimate:  18.88 KiB
  allocs estimate:  36
  --------------
  minimum time:     7.270 μs (0.00% GC)
  median time:      8.126 μs (0.00% GC)
  mean time:        10.872 μs (18.04% GC)
  maximum time:     10.592 ms (99.87% GC)
  --------------
  samples:          10000
  evals/sample:     4

3. n = 2,600, C=62.3 vs. G=63.7 μs, C faster

julia> n = 2600; 

julia> @benchmark join([c^n for c in s])
BenchmarkTools.Trial:
  memory estimate:  150.16 KiB
  allocs estimate:  39
  --------------
  minimum time:     51.746 μs (0.00% GC)
  median time:      63.293 μs (0.00% GC)
  mean time:        77.315 μs (2.79% GC)
  maximum time:     3.721 ms (96.85% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark join(c^n for c in s)
BenchmarkTools.Trial:
  memory estimate:  149.80 KiB
  allocs estimate:  36
  --------------
  minimum time:     47.897 μs (0.00% GC)
  median time:      63.720 μs (0.00% GC)
  mean time:        88.716 μs (17.58% GC)
  maximum time:     42.457 ms (99.83% GC)
  --------------
  samples:          10000
  evals/sample:     1

4. n = 26,000, C=667 vs. G=516 μs, G faster

julia> n = 26000; 

julia> @benchmark join([c^n for c in s])
BenchmarkTools.Trial:
  memory estimate:  1.44 MiB
  allocs estimate:  39
  --------------
  minimum time:     457.589 μs (0.00% GC)
  median time:      666.710 μs (0.00% GC)
  mean time:        729.592 μs (10.91% GC)
  maximum time:     42.673 ms (98.76% GC)
  --------------
  samples:          6659
  evals/sample:     1

julia> @benchmark join(c^n for c in s)
BenchmarkTools.Trial:
  memory estimate:  1.44 MiB
  allocs estimate:  36
  --------------
  minimum time:     475.977 μs (0.00% GC)
  median time:      516.176 μs (0.00% GC)
  mean time:        659.001 μs (10.36% GC)
  maximum time:     42.268 ms (98.41% GC)
  --------------
  samples:          7548
  evals/sample:     1

It would be nice to have time/memory performance comparisons between different methods. Maybe a user has only small strings, but has lots and lots of small strings. — rickhg12hs, Sep 27 '18 at 03:54
The `@benchmark` expressions could use some `$`'s before the variables, yes? — rickhg12hs, Sep 27 '18 at 04:52
@rickhg12hs Interesting thought. I had not thought of that and string interpolation would probably make a difference. — Julia Learner, Sep 27 '18 at 04:55
I found `string((c^n for c in s)...)` to be a good 4 times faster than the `join` solution on my machine. — niczky12, Sep 27 '18 at 11:03

score 2 · Accepted Answer · answered Sep 27 '18 at 11:20

2

In addition to the answers above, I found that the string function runs even faster. Here are my benchmarks:

julia> n = 2;

julia> s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

julia> string((c^n for c in s)...) # proof that it works
"AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUUVVWWXXYYZZ"

julia> n = 26000;

julia> @benchmark join(c^n for c in s)
BenchmarkTools.Trial:
  memory estimate:  1.44 MiB
  allocs estimate:  36
  --------------
  minimum time:     390.616 μs (0.00% GC)
  median time:      425.861 μs (0.00% GC)
  mean time:        484.638 μs (6.54% GC)
  maximum time:     45.006 ms (98.99% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark string((c^n for c in s)...)
BenchmarkTools.Trial:
  memory estimate:  1.29 MiB
  allocs estimate:  31
  --------------
  minimum time:     77.480 μs (0.00% GC)
  median time:      101.667 μs (0.00% GC)
  mean time:        126.455 μs (0.00% GC)
  maximum time:     832.524 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

As you can see it's about 3 times faster than the join solution proposed by @Julia Learner. I tested the above on 0.7 but had no deprecation warnings so I'm assuming it works fine on 1.0 too. Even TIO says so.

answered Sep 27 '18 at 11:20

niczky12

4,953
1
24
34

How does `@benchmark foldl((x,y)->x*string(y)^$n,"",$s)` run on your machine? – rickhg12hs Sep 27 '18 at 13:23
It runs in 2.242 ms (min time) median is 4.1ms. I also get a deprecation warning. – niczky12 Sep 27 '18 at 13:47
Ahh, I'm still on Julia v0.6.4 (no deprecation warnings) and it seems to be the fastest of the methods presented here, so far. When I install v0.7/v1.0 I'll play some more. – rickhg12hs Sep 27 '18 at 13:57
@niczky12 Love it! Your solution is cool. As a newbie to Julia I did not even think of using string with the `...` ellipsis operator. Actually, I think I tried string first but the result using `string([c^n for c in s])` is a mess. I am not used to the ellipsis operator. How do you conceptually view it? Why does it work so much better than `string([c^n for c in s])` for output? – Julia Learner Sep 27 '18 at 15:14
If you look at the definition of `string`, I'm using the last method there `string(xs...)` so it expects all inputs as separate arguments. Julia calls the ellipsis operator as the splat operator. It splits an array into separate arguments in this case. Check out the docs: https://docs.julialang.org/en/v1/manual/faq/#The-two-uses-of-the-...-operator:-slurping-and-splatting-1. – niczky12 Sep 28 '18 at 08:25
1

In general better avoid splatting (using the ellipsis) when the number of elements (here, characters) varies at runtime, as the function will have to recompiled for each particular number of elements. So it's fast, but only when you don't take into account compilation time. That's especially true for very short operations like this, not so much for long-running functions. – Milan Bouchet-Valat Sep 28 '18 at 17:34

woclass · Answer 3 · 2018-09-27T03:30:03.890

Code tested in Version 1.0.0 (2018-08-08).

When I'am trying to write map(x -> x^3, "123abc"), I got an error.

julia> map(x -> x^3, "123abc")
ERROR: ArgumentError: map(f, s::AbstractString) requires f to return AbstractChar; try map(f, collect(s)) or a comprehension instead

So, There's another way to do that.

julia> map(x -> x^3, collect("123abc"))
6-element Array{String,1}:
 "111"
 "222"
 "333"
 "aaa"
 "bbb"
 "ccc"

julia> join(map(x -> x^3, collect("123abc")))
"111222333aaabbbccc"

And Maybe repeat is more convenient.

julia> repeat(collect("123abc"), inner=3)
18-element Array{Char,1}:
 '1'
 '1'
 '1'
 '2'
 '2'
 '2'
 '3'
 '3'
 '3'
 'a'
 'a'
 'a'
 'b'
 'b'
 'b'
 'c'
 'c'
 'c'
julia> join(repeat(collect("123abc"), inner=3))
"111222333aaabbbccc"

You could also use: `join(c^3 for c in "123abc")' producing `"111222333aaabbbccc"` when hardcoding `n = 3`. — Julia Learner, Sep 27 '18 at 03:36

How to repeat individual characters in strings in Julia

3 Answers3