-2

I have c# strings that look like this:

"かたむく;かたぶく[ok]"
"そば[側,傍];そく[側];はた"
"くすり"
"おととい[gikun];おとつい[gikun];いっさくじつ"

How can I trim these down so that the output has only text up to the first occurrence of a ";" character (not a normal semicolon) or a "[" or if neither are present then the new string would be the same as the existing.

"かたむく"
"そば"
"くすり"
"おととい"

Is that something that would be best done with Regex or should I use some indexOf type of code to do this?

Alan2
  • 23,493
  • 79
  • 256
  • 450
  • 3
    What have you tried, what's not working? I'm sure we'll help and provide a solution when you show us your attempt. – Trevor Nov 06 '19 at 17:56
  • Code like `return "かたむく"` does exactly what you ask. If that's not correct you need to explain how you want to “trim” these strings. – Dour High Arch Nov 06 '19 at 17:58
  • `IndexOf` can be used to find the first index of the `[` character, and `Substring` can return the string up to that point. – Rufus L Nov 06 '19 at 18:07
  • Use string methods. Get indexof("[") then use substring(startindex) to return character after the bracket. – jdweng Nov 06 '19 at 18:08
  • Your input and expected output doesn't quite match. `かたむく;` doesn't have a space before the `;`. Also are you talking about the regular 'half width' space or a 'full width' space? If you want both, you'd have to alter the code accordingly. – Sach Nov 06 '19 at 18:12
  • Hi, thanks for all the comments. I now realize it's not a space followed by a semicolon. It's a ";" character – Alan2 Nov 06 '19 at 18:16
  • Yes, if you're not familiar with Japanese, there are two types of character sets. Full-width and half-width. You might want to read up a little bit about `Shift JIS`. https://stackoverflow.com/questions/10209766/ansi-vs-shift-jis-vs-utf-8-in-c-sharp https://stackoverflow.com/questions/57142760/check-if-a-string-is-half-width-or-full-width-in-c-sharp – Sach Nov 06 '19 at 18:20
  • @Alan2 You do realize that `;` is still a character like the space and the `[`, right? It requires no special treatment. So, what exactly isn't clear about the answers you received so far? – 41686d6564 stands w. Palestine Nov 06 '19 at 18:22
  • 1
    I really don't understand the answers to be honest. The OP hasn't showed us anything, but we flock at throwing opinionated answer's around. We don't know what the OP is working with, so how can we correctly determine any of these actually fit into what the OP is working with? Just my opinion and thoughts. – Trevor Nov 06 '19 at 18:43

3 Answers3

2

You don't need a Regex, just string.IndexOfAny. Something like:

 var inputs = new[]
 {
     "かたむく;かたぶく[ok]",
     "そば[側,傍];そく[側];はた",
     "くすり",
     "おととい[gikun];おとつい[gikun];いっさくじつ"
 };

 var separators = new[] {' ', '['};

 foreach (var input in inputs)
 {
     var separatorPosition = input.IndexOfAny(separators);
     if (separatorPosition >= 0)
     {
         Debug.WriteLine($"Split: {input.Substring(0, separatorPosition)}");
     }
     else
     {
         Debug.WriteLine($"No Split: {input}");
     }

 }

I get the following output from your inputs:

Split: かたむく;かたぶく
Split: そば
No Split: くすり
Split: おととい

It doesn't quite match what you show, but I think it's correct (and what you show isn't)

Flydog57
  • 6,851
  • 2
  • 17
  • 18
1

Expanding on my comment, "IndexOf can be used to find the first index of the [ character, and Substring can return the string up to that point."

public static string GetSubstringToChar(string input, char delimeter = '[')
{
    if (input == null || !input.Contains(delimeter)) return input;
    return input.Substring(0, input.IndexOf(delimeter));
}

To make this work with multiple delimeters, we can pass in an array of delimeter characters and use IndexOfAny:

public static string GetSubstringToChar(string input, char[] delimeters)
{
    if (input == null || !input.Any(delimeters.Contains)) return input;
    return input.Substring(0, input.IndexOfAny(delimeters));
}

You could then call this like:

var strings = new List<string>
{
    "かたむく;かたぶく[ok]",
    "そば[側,傍];そく[側];はた",
    "くすり",
    "おととい[gikun];おとつい[gikun];いっさくじつ",
};

var delimeters = new[] { ';', '[' };

foreach (var str in strings)
{
    Console.WriteLine(GetSubstringToChar(str, delimeters));
}
Rufus L
  • 36,127
  • 5
  • 30
  • 43
1

An extension method with a little validation will do the job.

public static string GetUntil(this string input, char[] delimiters)
{
   if (input == null || input.IndexOfAny(delimiters) == -1)
      return input;
   else
      return input.Split(delimiters)[0];
}

then call like:

var test = "かたむく;かたぶく[ok]".GetUntil(new char[] { ' ', '[' });
Kevin
  • 2,566
  • 1
  • 11
  • 12