1

I have a text file with 2 lines of words

CCCCC,WIKY PODAR,130000,15
DDDDD,XXXXX555,130110,30

Program reads each line word by word spilt and store them into an array.

check myStringArray.length returns : 7

However I expect the output to be : 8

The issue is that two words at the end and the begining of the line are concatenated. How to seperate them and store into the array properly ?

 String fileName = "mac/text.txt";
    byte[] buffer = new byte[1000];
    FileInputStream inputStream = new FileInputStream(fileName);
     while (inputStream.read(buffer) != -1) {
        String testString2 = new String(buffer);
        String delim2 = ",";
        String[] token2 = testString2.split(delim2);
        String[] myStringArray = new String[token2.length];
        for (int i = 0; i < token2.length; i++) {
            myStringArray[i] = token2[i];
             token2[i]=token2[i].replaceAll("\\s+", ", ");
                            }
        System.out.println(myStringArray.length);
gqli
  • 985
  • 3
  • 11
  • 34

3 Answers3

1

I think you should to split your String with two delimiters like this :

//delimiters with two separator , and space
String delim2 = ",|\\ ";
String testString2 = "CCCCC,WIKY PODAR,130000,15\n" +
                     "DDDDD,XXXXX555,130110,30";
String[] token2 = testString2.split(delim2);
System.out.println(token2.length);

This should 8 instead to 7, because we use two delimiters , and space.

EDIT

ok this is another way to learn from your file and split it :

public static void main(String[] args) throws FileNotFoundException, IOException {
    //path of your file
    String fileName = "mac/text.txt";
    //read your String from your file like this :
    String input = new String(Files.readAllBytes(Paths.get(fileName)));
    System.out.println(input);

    String delim = ",|\\ ";
    String[] token = input.split(delim);
    System.out.println(token.length);
}

Hope this can help you.

Youcef LAIDANI
  • 55,661
  • 15
  • 90
  • 140
  • Hi, Thank you for answering , but the length is still 7, i read the words from a text file . – gqli Jan 28 '17 at 08:48
  • can you print what your already read, because if you set this String so you will get 8 and not 7, maybe your text is diffirent so after you read your text print it @Snailwalker – Youcef LAIDANI Jan 28 '17 at 08:52
  • @Snailwalker i edit my answer hope this can work for you, i makes another way to read your String from file good luck – Youcef LAIDANI Jan 28 '17 at 09:04
1

the first line will end with a line-delimiter (which is different per operating system): see also this answer for more details

so if it should work for Windows, Linux and Mac files you may want to replace the line-delimiter with a comma first and then split the rest like this:

testString = testString.replaceAll("\r\n", ",").replaceAll("\r", ",").replaceAll("\n", ",");
// now your string looks like this: CCCCC,WIKY PODAR,130000,15,DDDDD,XXXXX555,130110,30
String[] token2 = testString.split(",");
Community
  • 1
  • 1
TmTron
  • 17,012
  • 10
  • 94
  • 142
  • Hey. this regx seems work. but there is a comma between WIKY and PODAR. – gqli Jan 28 '17 at 09:04
  • what do you mean? there is a space between WIKY and PODAR. If you want to also replace spaces, just add a replaceAll call. or if you want to replace all whitespace (incl. tabs), you can replace the whole line with this: `testString = testString.replaceAll("\\s", ",");` – TmTron Jan 28 '17 at 09:07
  • I mean my program outputs an extra comma in that space. It shouldn't be there though. Thank you. – gqli Jan 28 '17 at 09:17
1

Why not load all the content in a String and replace in the String the line separator String by the "," character ? Then you can easily split the String with a single separator ",".

You can try it :

String content = new String(Files.readAllBytes(Paths.get("mac/text.txt")));
content = content.replaceAll(System.lineSeparator(), ",");
String[] token2 = content.split(",");

Or if you want to avoid a call to replaceAll() and perform directly a split you can indicate in the regex the , character OR the line separator string :

String content = new String(Files.readAllBytes(Paths.get("mac/text.txt")));
String[] token2 = content.split(",|"+System.lineSeparator());
davidxxx
  • 125,838
  • 23
  • 214
  • 215