Read & compare text files and print words in alphabetical order

Question

First of all I'm sorry if similar questions has been asked before but I couldn't find a solution to what I was looking for. So I've this small java program which compares two text files (text1.txt & text2.txt) and print all the words of text1.txt which doesn't exist in text2.txt. The code below does the job:

text1.txt : This is text file 1. some @ random - text

text2.txt : this is text file 2.

import java.io.*;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.util.*;

public class Read {

   public static void main(String[] args) {
      Set<String> textFile1 = readFiles("text1.txt");
      Set<String> textFile2 = readFiles("text2.txt");

      for (String t : textFile1) {
         if (!textFile2.contains(t)) {
            System.out.println(t);
         }}}

   public static Set<String> readFiles(String filename) 
   {
      Set<String> words = new HashSet<String>();

      try {         
         for (String line : Files.readAllLines(new File(filename).toPath(), Charset.defaultCharset())) {    
            String[] split = line.split("\\s+");
            for (String word : split) {
               words.add(word.toLowerCase());
            }}} 
      catch (IOException e) {
         System.out.println(e);
      }
      return words;
   }
}

(Prints word in new line)

Output: @, some, random, 1.

I'm trying to print all the words in alphabetical order. And also if possible, it shouldn't print any specialized character(@,- or numbers). I've been trying to figure it out but no luck. I'd appreciate if someone could help me out with this.

Also I've taken the following line of code from internet which I'm not really familar with. Is there any other easier way to put this line of code:

String line : Files.readAllLines(new File(filename).toPath(), Charset.defaultCharset()))

Edit: HashSet is a must for this piece of work. Sorry I forgot to mention that.

Let's try to divide this question is smaller subquestions - and maybe you'll find answers on your own. — lexicore, Apr 11 '18 at 18:46
To print words in alphabetical order you need to sort them - but first you need to collect them in some collection. The easiest would be to calculate the difference between sets `textFile1` and `textFile2` as a set. Do you have an idea how to do this? — lexicore, Apr 11 '18 at 18:48
Now, assume you have `Set` which is the difference mentioned above. How would you sort a set? — lexicore, Apr 11 '18 at 18:50
Next, you don't want certain "words" like special character `@`, `-` or numbers. How would you check if a "word" matches your "desirability" pattern? — lexicore, Apr 11 '18 at 18:53
Is it a homework? Then please read [How do I ask and answer homework questions?](https://meta.stackoverflow.com/q/334822) — lexicore, Apr 11 '18 at 18:58
Thank you for taking your time to break it down for me. Your questions does make sense but confusing me as I'm still learning. And this is not really a homework but its a piece of work we did in class and I did not understand so I'm trying to learn and make this work. Thanks — brownKid, Apr 11 '18 at 19:09
I just think you'll understand it better if you'll solve it yourself. — lexicore, Apr 11 '18 at 19:12
@lexicore I agree with you which is why I've tried my best but it put me off because I couldn't figure it out. If you could help me I'd appriciate. Thanks — brownKid, Apr 11 '18 at 19:22

score 0 · Answer 1 · answered Apr 11 '18 at 18:50

0

Have you looked at any other Set implementations? I think if you use a SortedSet such as a TreeSet, instead of a HashSet, the words will automatically sort into alphabetical order.

Stack Overflow works better if you ask one question at a time.

answered Apr 11 '18 at 18:50

ᴇʟᴇvᴀтᴇ

12,285
4
43
66

The problem with `TreeSet` is that `contains` or `remove` operations which are needed to calculate the difference between sets are `O(log n)`. So it's better to use `HashSet` to calculate the difference and only then sort the resulting set. – lexicore Apr 11 '18 at 18:55
`Set difference = new TreeSet(textFile1); difference.removeAll(textFile2);` is even simpler than the current code. I don't see how we're trading efficiency for complexity here. Quite the opposite. (FYI I did not downvote.) – lexicore Apr 11 '18 at 19:01

score 0 · Accepted Answer · 2018-04-11T20:21:08.520

0

As you are not allowed to use a TreeSet and forced to use a HashSet, do it this way

import java.io.*;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.util.*;

public class Read {

   public static void main(String[] args) {
      Set<String> textFile1 = readFiles("text1.txt");
      Set<String> textFile2 = readFiles("text2.txt");

      Set<String> difference = new HashSet<String>();

      // collect strings by dropping out every string that's not only letters
      // using the regex "[a-zA-Z]+"
      for (String t : textFile1) {
         if (!textFile2.contains(t) && t.matches("[a-zA-Z]+")) {
            difference.add(t);
         }
      }

      // sort
      List<String> dList = new ArrayList<String>(difference);
      Collections.sort(dList);

      // show
      for (String s : dList) {
         System.out.println(s);
      }
   }

   public static Set<String> readFiles(String filename) 
   {
      Set<String> words = new HashSet<String>();

      try {         
         for (String line : Files.readAllLines(new File(filename).toPath(), Charset.defaultCharset())) {    
            String[] split = line.split("\\s+");
            for (String word : split) {
               words.add(word.toLowerCase());
            }}} 
      catch (IOException e) {
         System.out.println(e);
      }
      return words;
   }
}

edited Apr 11 '18 at 20:21

answered Apr 11 '18 at 18:53

1

Better `Set difference = new TreeSet(textFile1); difference.removeAll(textFile2);`. (I didn't downvote.) – lexicore Apr 11 '18 at 18:57
Fine, but my solution also works even it is not the high end solution. I can't understand why people vote my post down as it is not wrong. – Apr 11 '18 at 18:58
1

I personally see nothing here which would deserve a downvote. – lexicore Apr 11 '18 at 18:59
Thank you for your support. I really appreciate this. – Apr 11 '18 at 19:00
@DiabolicWords Thank you. This works fine for printing in a-z. Can this be done using only HashSet? & How would you go about to not print the specialized characters? – brownKid Apr 11 '18 at 19:18
Okay, I updated my post. It's now presenting a solution using HashSet. The answer to your second question follows. – Apr 11 '18 at 20:15
Updated again. Now your code only collects strings consisting of letters. If this solution helped you, please mark this post as solution to your question. – Apr 11 '18 at 20:22
@DiabolicWords omg mate! Thank you very much. I've spent a lot of time just to figure it out coz it was bugging me so much. Seeing that work makes me feel good. Appreciate the help! :) – brownKid Apr 11 '18 at 23:39

score 0 · Answer 3 · answered Apr 11 '18 at 18:57

From What I've read on the java documentation, a HashSet doesn't guarantee sorting on the elements in the set. However if you were to implement instead as a SortedSet it should allow for ordering of the elements, but you may possibly need to make a comparator for it as well.

As for your other questions, for reading files in java there is this guide from geeks for geeks that I find is very user friendly, especially for beginners, and shows a variety of ways to read a file.

Special characters may be a bit tricky, there is a guide here from a previous Stack Overflow answer that may be helpful though.

Read & compare text files and print words in alphabetical order

3 Answers3