1

Given a string of up to 256 characters, what IBM Mainframe Assembler instruction would you use to detect and point to the first occurance of a specific one-byte delimeter character within that string?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
dstaudacher
  • 546
  • 3
  • 11
  • 1
    Have you attempted anything? – Margaret Bloom Mar 08 '17 at 13:11
  • I can't imagine how the question could possibly be any more precise. The specifications are quite clear: Which "IBM Mainframe Assembler instruction" will "detect and point to the first occurance of a specific one-byte delimeter character within a string of 256 characters". I presume people here understand those terms. Should I include definitions for "Assembler instruction", "string" and "delimiter character"? – dstaudacher Mar 08 '17 at 15:39
  • 1
    *" The specifications are quite clear"* and then you use "one-byte delimiter", and "256 characters" "string", what is the connection between those? How many bytes is one character? What is string? ... (I mean, for example, I can show you examples of platforms/targets, where char is not one byte, and where string starts with length byte) – Ped7g Mar 08 '17 at 17:16
  • @Ped7g - ...I can show you examples of platforms/targets, where char is not one byte..." The question is quite clear, pertaining *specifically* to IBM Mainframes, and further specifies a "specific one-byte delimeter character within that string". So, your "I can show ... where char is not one byte..." criticism is absolutely irrelevant. Next time you feel compelled to jump down someone's throat over a post, first make sure you have a case. Here, you do not. – dstaudacher Jun 18 '20 at 00:46
  • @dstaudacher IBM Mainframes can still process strings in encoding which could contain particular delimiter-byte in different context (being prefixed with previous bytes). The UTF family of encoding are designed to not allow such clash, but some other encodings could contain it. I'm probably way too harsh in the tone, and for 99.9% practical cases with IBM Mainframe you are right, so I guess I should reconsider it next time, but I often operate (WRT asm programming) with true/false logic and even 0.01% of cases may be then problem, if not reasoned about. – Ped7g Jun 18 '20 at 20:08

2 Answers2

2

If you are running on a z13 or later the VECTOR FIND ELEMENT EQUAL (VFEE) instruction can also be very useful. It will search 16 bytes for a specific character and return the location of the character in the byte sequence. While you would need a loop to handle 256 characters, the performance will be much better than TRT. On older machines SRST would be the best instruction to use.

zArchJon
  • 41
  • 2
0

The instruction is TRT, "Translate and Test". For example, to find the first space (X'40') in a string:

[...] 
  TRT STRING,HEXTBL 
[...] 
STRING DC C'WHERE IS THE FIRST SPACE?' 
HEXTBL EQU * 
  DC X'00000000000000000000000000000000' VALUES X'00' - X'0F'
  DC X'00000000000000000000000000000000' VALUES X'10' - X'1F'
  DC X'00000000000000000000000000000000' VALUES X'20' - X'2F'
  DC X'00000000000000000000000000000000' VALUES X'30' - X'3F'
  DC X'FF000000000000000000000000000000' VALUES X'40' - X'4F'
  DC X'00000000000000000000000000000000' VALUES X'50' - X'5F'
  DC X'00000000000000000000000000000000' VALUES X'60' - X'6F'
  DC X'00000000000000000000000000000000' VALUES X'70' - X'7F'
  DC X'00000000000000000000000000000000' VALUES X'80' - X'8F'
  DC X'00000000000000000000000000000000' VALUES X'90' - X'9F'
  DC X'00000000000000000000000000000000' VALUES X'A0' - X'AF'
  DC X'00000000000000000000000000000000' VALUES X'B0' - X'BF'
  DC X'00000000000000000000000000000000' VALUES X'C0' - X'CF'
  DC X'00000000000000000000000000000000' VALUES X'D0' - X'DF'
  DC X'00000000000000000000000000000000' VALUES X'E0' - X'EF'
  DC X'00000000000000000000000000000000' VALUES X'F0' - X'FF'
  [...]     
dstaudacher
  • 546
  • 3
  • 11
  • In this example, "HEXTBL" is a 256-byte table of hex values, all binary zeros, except the for the X'FF' at offset +X'40'. This corresponds to the EBCDIC value for a space character (X'40'), thus making X'40' (space) the delimiter to be detected. After the TRT instruction executes, R1 will point to the first space character found (if any) and the corresponding table value (x'FF' in this case) will be loaded into the low-order byte of R2 (i.e. R2 = X'??????FF'). The two-bit condition code will also have a value of 01, indicating a delimiter was found *before* the last byte of 'STRING'. – dstaudacher Mar 08 '17 at 15:24
  • 1
    You can edit your answer by using the [edit](https://stackoverflow.com/posts/42674817/edit) button on the lower left corner. Another thing: I'm not an expert of IBM mainframes but shouldn't `TRT` be given the length of the field? Also isn't `C' '` a less obscure way to denote a space char? See [http://faculty.cs.niu.edu/~byrnes/csci360/notes/360trt.htm](http://faculty.cs.niu.edu/~byrnes/csci360/notes/360trt.htm) – Margaret Bloom Mar 08 '17 at 15:36
  • The IBM Assembler calculates the length of 'STRING' automatically and if an explicit length isn't specified, the calculated length is assumed. The length can be overridden by an explicit specification: (e.g. :"TRT STRING(5),HEXTBL" would *not* find the space delimeter since it's in position 6). The length can also be specified at runtime, either by inserting it directly into the instruction (not recommended) or by way of an EXecute instruction targeting the TRT. – dstaudacher Mar 08 '17 at 15:53
  • 1
    Cool! I believe it is all very good information to add to the answer so that it will attract more upvotes :) – Margaret Bloom Mar 08 '17 at 15:57
  • In this example, i use X'40' rather than C' ' primarily because it makes the reason for the X'FF' in the table at offset X'40' more obvious. Having established that I'll post a couple more examples using other delimieters. – dstaudacher Mar 08 '17 at 15:59
  • 2
    For somebody like me, coming from ordinary desktop CPUs (x86, Z80, ...) it would be also interesting if you would reason why exactly `TRT` is good for this (isn't simple loop testing each character against delimiter value better?) ... and the "translate and test" sounds like that thing can do much more than just search for a byte ... hm, did read [this](http://faculty.cs.niu.edu/~byrnes/csci360/notes/360trt.htm) and actually it's just messed up instruction name based on `TR`, not translating anything. For some weird reason all the IBM mainframe assembler examples makes me "ick". – Ped7g Mar 08 '17 at 17:31
  • 4
    Why is it much quicker? Just because it's one instruction doesn't make it faster. The TRT instruction needs to read two areas of memory, the string and the translation table, while the simple loop Ped7g suggests would only have to read the string. On the face of it the simple loop should be faster because it needs to do less work. One document I found on the web confirms this saying "Avoid using TRT to search for a single value. Instead, code the equivalent CLI loop or use the SRST instruction." http://www.tachyonsoft.com/s8192db.pdf – Ross Ridge Mar 08 '17 at 19:17
  • I'll grant TRT poorly named. It doesn't actually "translate" anything. It's like TR only in that it uses a similar lookup table. A loop incrementing a pointer so as to test each byte in turn with a limit on the number of bytes checked would also work, but TRT can do the same thing with just one instruction, as long as the delimiter is within the first 256 bytes. To check 256 bytes one at a time could take more than 512 instructions including setup. I'll post another sample question and answer pair using the one-byte-at-a-time method so you can compare the two. – dstaudacher Mar 08 '17 at 22:59
  • 2
    But `TRT` is not "just one instruction" ... it's one instruction plus 256B table. If you would tell me it's "just one instruction and 256B table" in the 8bit era, I would slap you before the "table" word would finish (especially on 16kiB machines)... :D – Ped7g Mar 09 '17 at 05:01
  • 2
    To the point of TRT being faster than a compare loop, consider that properly done, TRT lets you write much more compact code. TRT followed by BZ is all that's needed for the "not found" case. And in the example above, X'FF' is used...more common would be a multiple of four so that you can have any number of characters detected, using a simple branch table and "B *+4(R2)" to jump to the correct spot depending on the value picked up from your TRT. In that one TRT instruction, you can pretty easily edit out invalid characters, and branch to routines to handle any input string value. – Valerie R May 22 '17 at 23:47
  • For completeness, many "old" /360 assembler instructions, such as TRT, now have "E" (extended) variants that allow larger fields to be processed. – TonyR May 17 '20 at 11:47