0

I have a file that is a permuted word-list, formatted like this. The way it is formatted, when I open it up in a program like notepad, it appears to be not be spaced out at all, so for example, to the human eye, the first bit looks like this:

    ATHROCYTESDISHLIKEIRRECOVERABLENESSESEMBRITTLEMENTSYOUNGSOVER

but when I copy and past it, it appears formatted like this:

    ATHROCYTES
    DISHLIKE
    IRRECOVERABLENESSES
    EMBRITTLEMENTS
    YOUNGS
    OVER

I am trying to load this file into an array so I can sort it. I am struggling as to how to break this up properly. I have found that using this code:

    while (dis.available() != 0) {
            System.out.println(dis.readLine());
        }

prints out the document formatted correctly, just as if I would have copy and pasted it. I am using this code to try and load it in an array:

    String[] store = sb.toString().split(",");

Since there are no commas, the words aren't separated correctly. Realizing this, I have also tried this code to try and split it at each new line:

    String[] store = sb.toString().split(scan.nextLine());

Both of these give me the same result, the words being printed on the same line. Does anyone now how I could get my results properly formatted into an array?

I've included the rest of my code since it is possible that the problem originates elsewhere:

public class InsertionSort {

public static String[] InsertSort(String[] args) {
    int i, j;
    String key;

    for (j = 1; j < args.length; j++) { //the condition has changed
        key = args[j];
        i = j - 1;
        while (i >= 0) {
            if (key.compareTo(args[i]) > 0) {//here too
                break;
            }
            args[i + 1] = args[i];
            i--;
        }
        args[i + 1] = key;
        return args;
    }

    return args;
}

/**
 * @param args the command line arguments
 */
public static void main(String[] args) throws FileNotFoundException, IOException {
    Scanner scan = new Scanner(System.in);
    System.out.println("Insertion Sort Test\n");


    int n;
    String name, line;


    System.out.println("Enter name of file to sort: ");
    name = scan.next();

    BufferedReader reader = new BufferedReader(new FileReader(new File(name)));
    //The StringBuffer will be used to create a string if your file has multiple lines
    StringBuffer sb = new StringBuffer();

    File file = new File(name);
    FileInputStream fis = null;
    BufferedInputStream bis = null;
    DataInputStream dis = null;

    try {
        fis = new FileInputStream(file);

        // Here BufferedInputStream is added for fast reading.
        bis = new BufferedInputStream(fis);
        dis = new DataInputStream(bis);

        // dis.available() returns 0 if the file does not have more lines.
        while (dis.available() != 0) {

  // this statement reads the line from the file and print it to
            // the console.
            System.out.println(dis.readLine());
        }

        // dispose all the resources after using them.
        fis.close();
        bis.close();
        dis.close();

    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

    while((line = reader.readLine())!= null){

    sb.append(line);

}

    //We now split the line on the "," to get a string array of the values
    String[] store = sb.toString().split("/n");
     System.out.println(Arrays.toString(store));
    /* Call method sort */
    InsertSort(store);

    n = store.length;
    FileWriter fw = new FileWriter("sorted.txt");


for (int i = 0; i < store.length; i++) {
  fw.write(store[i] + "\n");
}
fw.close();
     }

}
user3068177
  • 357
  • 2
  • 5
  • 17
  • Have you tried notepad++? It works a lot better than notepad. The lines are probably separated by a line break (\n). That should be your delimiter. I'm not familiar with Java, but this does seem to be your problem. – Steven Walton Sep 13 '15 at 00:09
  • I was just using notepad since it is a .txt file. I am doing all of my coding in NetBeans. With that said, I tried editing my code to make the split \n, giving me: String[] store = sb.toString().split("/n"); but I still get the same result of them all being on the same line. – user3068177 Sep 13 '15 at 00:24
  • Well you used the wrong slash. Also, notepad++ reads files better, that's just why I'm suggesting it. – Steven Walton Sep 13 '15 at 00:59

2 Answers2

1

You have premature return statement here:

  args[i + 1] = key;
  return args; // the cause
}

Remove it, and it's should be fixed:

[ATHROCYTES, DISHLIKE, IRRECOVERABLENESSES, EMBRITTLEMENTS, YOUNGS, OVER]

 DISHLIKE -> ATHROCYTES = 3
 IRRECOVERABLENESSES -> DISHLIKE = 5
 EMBRITTLEMENTS -> IRRECOVERABLENESSES = -4
 EMBRITTLEMENTS -> DISHLIKE = 1
 YOUNGS -> IRRECOVERABLENESSES = 16
 OVER -> YOUNGS = -10
 OVER -> IRRECOVERABLENESSES = 6

[ATHROCYTES, DISHLIKE, EMBRITTLEMENTS, IRRECOVERABLENESSES, OVER, YOUNGS]

Complete code:

public static String[] InsertSort(String[] args) {
  int i, j;
  String key;

  System.out.println(Arrays.toString(args));

  for (j = 1; j < args.length; j++) { //the condition has changed
    key = args[j];
    i = j - 1;
    while (i >= 0) {
      System.out.printf(" %s -> %s = %d\n", key, args[i], key.compareTo(args[i]));
      if (key.compareTo(args[i]) > 0)//here too
        break;
      args[i + 1] = args[i];
      i--;
    }
    args[i + 1] = key;
  }

  return args;
}

public static void main(String[] args) throws FileNotFoundException, IOException {
  Scanner scan = new Scanner(System.in);
  System.out.println("Insertion Sort Test\n");

  System.out.println("Enter name of file to sort: ");
  String name = scan.nextLine();

  File file = new File(name);
  String sb = (new Scanner(file)).useDelimiter("\\Z").next();

  //We now split the line on the "," to get a string array of the values
  List<String> list = Arrays.asList(sb.split("\n\r?"));

  ArrayList<String> list2 = new ArrayList<>();
  list.stream().forEach((s) -> {
    list2.add(s.trim());
  });

  System.out.println(list2);
  /* Call method sort */
  String[] store = list2.toArray(new String[]{});

  InsertSort(store);

  System.out.println(Arrays.asList(store));

  int n = store.length;

  try (FileWriter fw = new FileWriter("sorted.txt")) {
    StringBuilder b = new StringBuilder();
    for (String s: store)
      b.append(s).append("\n");

    fw.write(b.toString());
  }
}
ankhzet
  • 2,517
  • 1
  • 24
  • 31
0

The reason your file appears as one line in Windows Notepad is likely because Notepad only recognizes CRLF, \n\r as a newline, while most UNIX programs treat just an LF, \n as a newline. Your text file was likely generated by a UNIX program. Further explanation can be found here.

Now, onto your code.

String[] store = sb.toString().split(scan.nextLine());

This line of code is feeding split() whatever the first line of your scanner is. I have no idea what this might be, but what split is going to do is look for instances of that item, and partition the string at those instances.

What you want is

String[] store = sb.toString.split("\n\r?");

String.split() accepts a Java Regular Expression. The regular expression

"\n\r?"

Is equivalent to saying 'Split at a Linefeed, or a CRLF`

Furthermore, I would reccomend parsing your string with a Scanner instead of trying to split it into an array.

Scanner scan = new Scanner(sb.toString());
while(scan.hasNextLine()) {
    //Do stuff with scan.nextLine()
}

Edit: Remember that escaped characters use a backslash, not a forward slash. For example, \n or \r.

  • `"\n\r|[\n\r]"` can be symplified to `"\n\r?"`, afaik – ankhzet Sep 13 '15 at 00:42
  • `"\n\r|[\n\r]"` works with both UNIX and Windows line endings. `"\n\r"` will work in this case, but it's best practice to take the approach that will always work. [Java Scanners](http://stackoverflow.com/questions/5918896/java-scanner-newline-recognition) use `"\r\n|[\n\r\u2028\u2029\u0085]"` for their default regex. – Dylan Culfogienis Sep 13 '15 at 00:48
  • eh, regex `"\n\r?"` is _equal_ to `"\n\r|[\n\r]"`, both of it will capture same sequences (`\n`, `\n\r`). or you everlooked the `?` modifier to `\r` char? – ankhzet Sep 13 '15 at 00:50
  • My apologies, I did indeed. Changed my answer to fit. `"\n\r?"` does do something different, but it doesn't matter in this case. – Dylan Culfogienis Sep 13 '15 at 01:00
  • After trying all the different solutions posted, it looks like I have other problems since I am still at the end of execution receiving the same unsorted word-list. – user3068177 Sep 13 '15 at 01:07