-2

I have a java method in which trying to parse a string where fields are delimited by char ^A. Sample string like below.

HDR^A1^A20220106^ATYPE^AXXX^AJAPAN^AUNIFORM^AHELP^AEXAMPLE^A

I have attempted to use apache ordinalIndexOf but so far not yet successful with the same.

Is there any other alternative approach available for this scenario?

public class HelloWorld{

public static int ordinalIndexOf(String str, String substr, int n) {

    int pos = str.indexOf(substr);

    while (--n > 0 && pos != -1)

        pos = str.indexOf(substr, pos+1);

    return pos;

}
     public static void main(String []args){
//Just kept it here as I need to use standards.. not using in below code
         String CTRL_A = Character.valueOf((char) 0x01).toString();    
     String str = "HDR^ABYE^A20220103065014^Agoogle.com_29958^ABUDDY^A1.0^A123456789012^AHAI^ABYE";

              int position = ordinalIndexOf(str,"^A",6);

        System.out.println(str.substring(0,position));

     }

}

Expected Output String:

EVENT_HDR^ABYE^A20220103065014^Agoogle.com_29958^ABUDDY^A1.0^A123456789012

Referred Link

E_net4
  • 27,810
  • 13
  • 101
  • 139
Programmer
  • 65
  • 8

2 Answers2

0

You could try to use the .indexOf() method. So, in this case try:

String str = "HDR^A1^A20220106^ATYPE^AXXX^AJAPAN^AUNIFORM^AHELP^AEXAMPLE^A";
System.out.println(str.indexOf("^A"));

This will print the index of the first appearance of the given substring. Alternatively, you could do:

String str = "HDR^A1^A20220106^ATYPE^AXXX^AJAPAN^AUNIFORM^AHELP^AEXAMPLE^A";
System.out.println(str.indexOf("^A", 10)); // Start at index 10.

Please refer to the documention here for more: https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#indexOf(java.lang.String)

  • Thanks for the prompt answer, I have updated code and expected output string. Though I am trying to find index of 6th char, plan is to perform str.substring(0, value_from_nth_occurance) – Programmer Jan 05 '22 at 06:54
  • Yes, I see that now. Something I can think of is looping through the input string and separating the substrings using the index. While doing this, update the index so you can parse the next substring. –  Jan 05 '22 at 06:58
0

This will extract terms adjacent to ^A up to the nth ^A (which is n+1 terms), or blank if that many terms are not present:

int n = 6; // for example
str = str.replaceAll("(.*?(\\^A.*?){" + n + "})\\^A.*|.*", "$1");

See live demo.

If the "^A" in your input really means character 1, ie (char)1 then use octal \001 to code it as a String literal:

str = str.replaceAll("(.*?(\001.*?){" + n + "})\001.*|.*", "$1");

The result is a blank when there are insufficient terms (less than n delimiters) thanks to an alternation |.* after the main regex. Alternations are attempted left to right: If the main regex matches, the captured group will contain your target, but if it doesn't match group 1 will be empty. In both cases, the entire String is matched and the replacement is group 1, so the result is either the first n+1 terms or blank if n delimiters are not present.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • Thank. you. Let me try with few examples and confirm. – Programmer Jan 05 '22 at 07:11
  • I just tried above one but it is returning whole initial string back. Refer here for code attempted -- https://www.online-java.com/oZfkYLcA6b – Programmer Jan 05 '22 at 07:44
  • @Keyan fixed. See demo. – Bohemian Jan 05 '22 at 08:50
  • Thank you. Looks like it is yielding correct results. Two questions, is there a way I can use char representation instead of hardcoded ^A like String CTRL_A = Character.valueOf((char) 0x01).toString(); and second one, I already added null check and empty string check, anything else as in recommended for this pattern matching. Also if you can help to explain this pattern with breakdown will help me to learn as well. Thanks Again. – Programmer Jan 05 '22 at 14:53
  • @Keyan I don’t know what you mean about ctrl A, but can code as a Unicode character: `String CTRL_A = "\u0001";` or an octal character: `String CTRL_A = "\001";`. Does your input String actually contain these characters and not `"^A"`? – Bohemian Jan 05 '22 at 21:08
  • When I open the file which has this string in vi editor or notepad I see visually as ^A but in existing code whether this string is handled everywhere I see they have used it as unicode char. So I thought will retain the same. In existing code I see as "String CTRL_A = Character.valueOf((char) 0x01).toString(); " and same is used throughout the code and not ^A – Programmer Jan 06 '22 at 05:01
  • 1
    @Keyan see edits to answer for dealing with ctrl-A characters simply – Bohemian Jan 06 '22 at 08:14
  • Thank you. It works! If I replace the 6 hardcoded value with int POS = 6 and if I use POS will that work? And also, is there way to check in a string if we have all the way up-to 6th delimiter before parse and avoid getting replace-all to fail or yield null value. – Programmer Jan 13 '22 at 04:46
  • @Keyan answer modified to work with a variable number of terms *plus* produce the blank string `""` if the input has less the the specified number of terms. – Bohemian Jan 13 '22 at 18:51