3

I need to extract a specific string from a text file that has lines with multiple Delimiters that may be similar or different. For example, lets say I have a text file contains the below lines. Let's consider each text between a delimiter as a segment.

ABC#12#3#LINE1####1234678985$
DEF#XY#Z:1234:1234561230$
ABC#12#3#LINE TWO####1234678985$
DEF#XY#Z:1234:4564561230$
ABC#12#3#3RD LINE####1234678985$
DEF#XY#Z*1234:7894561230$

I need to write a code that extracts the text after ABC#12#3# in all the lines in the text file, based on two inputs.

1) The segment to find (e.g., ABC)

2) Position of the segment from which I need to extract the text. (e.g., 4)

So, an input of ABC and 4th segment will give a result - LINE1 and an input of DEF and 5th segment will give a result - 1234678985. This is what I've got so far regarding the 1st input.

scanner = new Scanner(file);
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (line.contains(find)){   // find is the 1st input - (e.g., ABC)
System.out.println("Line to be replaced - "+ line);
int ind1 = line.indexOf(findlastchar+"*")+1;
int ind2 = line.indexOf("*");
System.out.println("Ind1 is "+ ind1+ " and Ind2 is " + ind2);
System.out.println("findlastchar is "+findlastchar+"#");
remove = line.substring(line.indexOf(findlastchar)+1, line.indexOf("#"));
System.out.println("String to be replaced " + remove);
content = content.replaceAll(remove, replace);
    }
}

I've got 2 problems with my code. I don't know how I can use substring to separate text between SAME delimiters and I'm not sure how to write the code such that it is able to identify all the following special characters as delimiters - {#, $, :} and thereby consider any text between ANY of these delimiters as a segment.

Answer to this question uses regex which I want to avoid.

justcurious
  • 839
  • 3
  • 12
  • 29
  • You have many special characters there, note that `replaceAll` accepts a *regex*. – Maroun Oct 06 '15 at 12:55
  • Possible duplicate of [Java: use split() with multiple delimiters](http://stackoverflow.com/questions/5993779/java-use-split-with-multiple-delimiters) – hotzst Oct 06 '15 at 13:14

3 Answers3

0

Simply split the line and use index:

public String GetItemFromLine(String s, String delimiter, String prefix, int index) {
   String[] items = s.split(delimiter);
   return items[0] == prefix ? items[index] : null;
}

PS I have no experience with Java so please treat this example as pseudo-code.

Alexander Trakhimenok
  • 6,019
  • 2
  • 27
  • 52
0

Either use a StringTokenizer, where you can pass the delimiters as a String, and then loop over it (See this example) or even better use String.split with a regexp:

String[] words = line.split("#|$|:");
hotzst
  • 7,238
  • 9
  • 41
  • 64
0

It its probably not the most efficient way, but you can do it with regex, for example:

(ABC[#:*$]+(?:\w+[#:*$]+){2}|DEF[#:*$]+(?:\w+[#:*$]+){3})([^#:*$]+)(.+)

DEMO

Where with {2} and {3} (nambers of repetitions of given pattern) you decide which part of string should be repleced. In this case you change only fragment between delimiters. Example in Java:

public class Test{
    public static void main(String[] args) {
        String[] lines = {"ABC#12#3#LINE1####1234678985$",
                "DEF#XY#Z:1234:1234561230$",
                "ABC#12#3#LINE TWO####1234678985$",
                "DEF#XY#Z:1234:4564561230$",
                "ABC#12#3#3RD LINE####1234678985$",
                "DEF#XY#Z*1234:7894561230$"};
        for (String line : lines) {
            String result = line.replaceAll("(ABC[#:*$]+(?:\\w+[#:*$]+){2}|DEF[#:*$]+(?:\\w+[#:*$]+){3})([^#:*$]+)(.+)","$1" + " replacement " + "$3");
            System.out.println(result);
        }
    }
}

with output:

ABC#12#3# replacement ####1234678985$
DEF#XY#Z:1234: replacement $
ABC#12#3# replacement ####1234678985$
DEF#XY#Z:1234: replacement $
ABC#12#3# replacement ####1234678985$
DEF#XY#Z*1234: replacement $
m.cekiera
  • 5,365
  • 5
  • 21
  • 35