3

With Mathematica I always feel that strings are "second class citizens." Compared to a language such as PERL, one must juggle a lot of code to accomplish the same task.

The available functionality is not bad, but the syntax is uncomfortable. While there are a few shorthand forms such as <> for StringJoin and ~~ for StringExpression, most of the string functionality lacks such syntax, and uses clumsy names like: StringReplace, StringDrop, StringReverse, Characters, CharacterRange, FromCharacterCode, and RegularExpression.

In Mathematica strings are handled like mathematical objects, allowing 5 "a" + "b" where "a" and "b" act as symbols. This is a feature that I would not change, even if that would not break stacks of code. Nevertheless it precludes certain terse string syntax, wherein the expression 5 "a" + "b" would be rendered "aaaaab" for example.


What is the best way to make string manipulation more convenient in Mathematica?

Ideas that come to mind, either alone or in combination, are:

  1. Overload existing functions to work on strings, e.g. Take, Replace, Reverse.

    • This was the original topic of my question to which Sasha replied. It was seen as inadvisable.

  2. Use shortened names for string functions, e.g. StringReplace >> StrRpl, Characters >> Chrs, RegularExpression >> "RegEx"

  3. Create new infix syntax for string functions, and possibly new string operations.

  4. Create a new container for strings, e.g. str["string"], and then definitions for various functions. (This was suggested by Leonid Shifrin.)

  5. A variable of (4), expand strings (automatically?) to characters, e.g. "string" >> str["s","t","r","i","n","g"] so that the characters can be seen by Part, Take, etc.

  6. Call another language such as PERL from within Mathematica to handle string processing.

  7. Create new string functions that conglomerate frequently used sequences of operations.

Mr.Wizard
  • 24,179
  • 5
  • 44
  • 125
  • 6
    There's a few shorter names for string manipulation in [dreeves](http://stackoverflow.com/users/4234/dreeves) contribution to the [Mma tool bag question](http://stackoverflow.com/questions/4198961/what-is-in-your-mathematica-tool-bag/4213366#4213366). I'm not sure if overloading is a good idea... – Simon Apr 16 '11 at 23:10
  • 2
    I think Simon is right. Having a list of lists of strings and applying Map[... ,Infinity] to those functions will bring up a debugging nightmare – Dr. belisarius Apr 17 '11 at 02:35
  • @Simon @belisarius off-topic, is there an apparent reason why I was down-voted for this: http://stackoverflow.com/questions/5683228 ? – Mr.Wizard Apr 17 '11 at 16:24
  • @Mr. fixed that :). It's a common behavior of X language bigots when they see an out-of-their-yogurt-flask answer – Dr. belisarius Apr 17 '11 at 16:41
  • 1
    I also think that overloading is not a good idea. You may not only break your code (which you can debug), but also break some top-level built-in (or third-party, if you use other add-ons) code, may be even without knowing it. If I really had to overload list functions, I'd construct a type (container) with the custom head (say `string`, so the string "abc" would be converted into `string["abc"]`), and overload list functions of interest only on that type via UpValues (like `string/:Take[s_string,k_]:= Map[StringTake[#,k]&,s]`). This is much safer. – Leonid Shifrin Apr 18 '11 at 16:39

1 Answers1

5

I think the reason these operations have String* names is that they have tiny differences compared to their list counterparts. Specifically compare Cases to StringCases.

Now the way to to achieve what you want is to do it like this:

Begin["StringOverload`"];
{Drop, Cases, Take, Reverse};
Unprotect[String];
ToStringHead[Drop] = StringDrop;
ToStringHead[Take] = StringTake;
ToStringHead[Cases] = StringCases;
ToStringHead[Reverse] = StringReverse;
String /: 
 HoldPattern[(h : Drop | Cases | Take | Reverse)[s_String, rest__]] :=
  With[{head = ToStringHead[h]}, head[s, rest]]
RemoveOverloading[] := 
 UpValues[String] = 
  DeleteCases[UpValues[String], 
   x_ /; ! FreeQ[Unevaluated[x], (Drop | Cases | Take | Reverse)]]
End[];

You get to load stuff with Get or Need, and remove the overloading with RemoveOverloading[] called with the correct context.

In[21]:= Cases["this is a sentence", RegularExpression["\\s\\w\\w\\s"]]

Out[21]= {" is "}

In[22]:= Take["This is dangerous", -9]

Out[22]= "dangerous"

In[23]:= Drop["This is dangerous", -9]

Out[23]= "This is "

I do not think doing this is the right way to go, though. You might consider introducing shorter symbols in some context which would automatically evaluate to String* symbols

Sasha
  • 5,935
  • 1
  • 25
  • 33