0

I'm new to MIPS, and I'm trying to figure out how to manipulate individual characters in a string without lb/sb and an offset. I already know how to do this by loading the address of the string and looping through by incrementing an offset, but what if I just had a single register of characters? Let's say I have a register that holds a few characters. How could I access each character and make it uppercase. I know I have to subtract the character by 32 to make it uppercase, but I'm having trouble traversing across characters. If I shift, wouldn't I end up losing characters? Like this:

add $t0, $t0, 1
subi $t0, $t0, 32
add $t0, $t0, 1

and so on. What's the right way to go through each character?

  • 1
    If you **know beforehand** each word has english characters (each byte either upper case or lower case ASCII code) or null value, you can `andi` with `0xDFDFDFDF` to make every lower case letter be upper case. Just look at an ASCII table to see the pattern. – gusbro Jun 10 '21 at 05:22

1 Answers1

2

There's no way to access memory without using load operations.  If you want to use word-sized load operations (lw) on a string, you'll be limited by the requirement to use aligned addresses for these instructions (on MIPS — other processors will do unaligned accesses with a minimal performance penalty).

Dealing with the alignment requirements is not so hard if we can rely on all strings starting on aligned boundaries and also always being multiples of 4 bytes long.  Removing the length restriction (multiple of 4) adds complexity, as does removing the initial alignment restriction (multiple of 4).  For a general purpose solution, both these alignment issues would need to be solved, which means differentiating between a multiplicity of cases in order to use word-sized operations.


If you did have 4 characters in a single register, and you want to adjust (i.e. uppercase) each of its distinct 4 bytes, you'll pretty much have to look at them individually.  There's really no way to instantly compute the value to add that will uppercase each byte.

To be clear, for any given 4 byte value of 4 characters, there is exactly one single 32-bit adjustment value that could be added in order to upper case each of the 4 bytes all at once — but there's 16 possible such values, and no easy way to figure out which one of 16 is the right one for any given 4 byte value.  So, you'd have to extract each byte and consider it individually, which would be almost as efficient as using lb/sb directly.

Erik Eidt
  • 23,049
  • 2
  • 29
  • 53
  • If you know all your characters are alphabetic, it's actually very easy to force them all to upper-case: AND with `~0x20202020` to clear the upper-case bit in each byte. [What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa?](https://stackoverflow.com/a/54585515) Of course you don't want to *add* or *subtract*, but you're talking about it like there isn't anything else you could do. – Peter Cordes Jun 10 '21 at 11:40
  • Conditionally doing this only for alphabetic characters is harder, and maybe not worth it with only 4-byte registers. [SWAR (SIMD Within A Register)](https://en.wikipedia.org/wiki/SWAR) add/subtract while blocking carry between elements takes more masking. But might be doable for 7-bit ASCII, including packed `c -= 'a'` and then a `c-=25` (max alphabet index) to set or clear that spare top bit. Maybe even worth it on MIPS64 where you can do 8 bytes at a time. Obviously way easier with proper SIMD, like SSE2 `pcmpgtb` or saturating subtract + compare; presumably MIPS SIMD has similar stuff. – Peter Cordes Jun 10 '21 at 11:44