0

I have a simple task of expanding the string FX according to the following rules:

X -> X+YF+
Y-> -FX-Y

In OpenCL, string manipulation is not supported but the use of an array of characters is. How would a kernel program that expands this string in parallel look like in openCL?

More details: Consider the expansion of 'FX' in the python code below.

axiom = "FX"
def expand(s):
    switch = {
        "X": "X+YF+",
        "Y": "-FX-Y",
    }
    return switch.get(s, s)


def expand_once(string):
    return [expand(c) for c in string]


def expand_n(s, n):
    for i in range(n):
        s = ''.join(expand_once(s))
    return s


expanded = expand_n(axiom, 200)

The result expanded will be a result of expanding the axiom 'FX' 200 times. This is a rather slow process thus the need to do it on openCL for parallelization. This process results in an array of strings which I will then use to draw a dragon curve.

below is an example of how I would come up with such a dragon curve: This part is not of much importance. The expansion on OpenCL is the crucial part.

 import turtles
    from PIL import Image
turtles.setposition(5000, 5000)
turtles.left(90)  # Go up to start.


for c in expanded:
    if c == "F":
        turtles.forward(10)
    elif c == "-":
        turtles.left(90)
    elif c == "+":
        turtles.right(90)

# Write out the image.
im = Image.fromarray(turtles.canvas)
im.save("dragon_curve.jpg")
Gakuo
  • 845
  • 6
  • 26
  • The question is not 100% clear - you might want to give some (slightly more advanced) examples. One important aspect you will need to decide is how the length of the string will change as a result of your transformation, and how predictable this is. Each work-item will presumably operate on some substring, so the biggest question you will need to answer is, where in the result does each work-item put its result? The actual mechanics of string manipulation are simple - depending on the string encoding, you'll need to work on individual bytes, or 16-bit words, etc. – pmdj Jun 05 '18 at 12:24
  • I am idealess. At this point, I want something to work with. Your comment, for example, gives me a hint. Anything that works for string expansion on OpenCL is welcome. – Gakuo Jun 05 '18 at 14:10
  • Explain your problem better and we can give you some ideas. At the moment, I have no idea of a possible input and output. Also good to know: how long are your strings? – pmdj Jun 05 '18 at 14:11
  • please see my edit above – Gakuo Jun 05 '18 at 14:31
  • Is 200 times all? This is probably too small a task to be worth it on a gpu. If this is slow in Python, you can probably make it near-instant in C. – pmdj Jun 06 '18 at 06:06
  • Not just 200 times, could be 200,000. I already have a c program now. See it at https://stackoverflow.com/questions/50709618/optimization-of-opencl-programs – Gakuo Jun 06 '18 at 09:13

1 Answers1

1

Recursive algorithms like this don't especially lend themselves to GPU acceleration, especially as the data set changes its size on each iteration.

If you do really need to do this iteratively, the challenge is for each work-item to know where in the output string to place its result. One way to do this would be to assign work groups a specific substring of the input, and on every iteration, keep count of the total number of Xs and Ys in each workgroup-sized substring of the output. From this you can calculate how much that substring will expand in one iteration, and if you accumulate those values, you'll know the offset of the output of each substring expansion. Whether this is efficient is another question. :-)

However, your algorithm is actually fairly predictable: you can calculate precisely how large the final string will be given the initial string and number of iterations. The best way to generate this string with OpenCL would be to come up with a non-recursive function which analytically calculates the character at position N given M iterations, and then call that function once per work-item, with the (known!) final length of the string as the work size. I don't know if it's possible to come up with such a function, but it seems like it might be, and if it is possible, this is probably the most efficient way to do it on a GPU.

It seems like this might be possible: as far as I can tell, the result will be highly periodic:

FX
FX+YF+
FX+YF++-FX-YF+
FX+YF++-FX-YF++-FX+YF+--FX-YF+
FX+YF++-FX-YF++-FX+YF+--FX-YF++-FX+YF++-FX-YF+--FX+YF+--FX-YF+
^^^^^^ ^^^^^^^ ^^^^^^^ ^^^^^^^ ^^^^^^^ ^^^^^^^ ^^^^^^^ ^^^^^^^
  A*      B       A       B       A       B       A       B

As far as I can see, those A blocks are all identical, and so are the Bs. (apart from the first A which is effectively at position -1) You can therefore determine the characters at 14 positions out of every 16 completely deterministically. I strongly suspect it's possible to work out the pattern of +s and -s that connects them too. If you figure that out, the solution becomes pretty easy.

Note though that when you have that function, you probably don't even need to put the result in a giant string: you can just feed your drawing algorithm with that function directly.

pmdj
  • 22,018
  • 3
  • 52
  • 103
  • Hey, I came up with an algorithm for this: There's a pattern A+B and A-B. In between the two is either +- or --. Each Odd position takes A+B and even positions take A-B. The +- and -- take a pattern determined by powers of 2. That is just the general idea. My worry is that the in computation of a position's value my algorithm requires the value of a previously calculated value. Communicating previously computed value according to my understanding of OpenCL is costly. Is there a way around this? – Gakuo Jul 23 '18 at 02:04
  • That's why I'm saying don't make this an iterative algorithm; figure out the +/- pattern analytically so you can generate it directly from the position. You probably need to do a few more expansions by hand before the pattern becomes obvious. – pmdj Jul 23 '18 at 08:11
  • I just figured out how to do it. Was staring at it all along. This is quite clever. Thanks – Gakuo Jul 23 '18 at 12:30
  • I got stuck with OpenCL string processing and manipulation so I posted a further question. Please see if you can give me some insights at https://stackoverflow.com/questions/51506971/string-operations-on-opencl – Gakuo Jul 24 '18 at 20:30