I have been working on an assignment in that I have to read words from a file and find the longest word and check how many sub words contains in that longest word? this should work for all the words in the file.
I tried using java the code I wrote works for the small amount of data in file but my task is to process huge amount of data.
Example: File words: "call","me","later","hey","how","callmelater","now","iam","busy","noway","nowiambusy"
o/p: callmelater : subwords->call,me,later
In this I'm reading file words storing in linked list and then finding the longest word & removing it from the list then checking how many sub-words extracted word contains.
Main Class Assignment:
import java.util.Scanner;
public class Assignment {
public static void main (String[] args){
long start = System.currentTimeMillis();;
Assignment a = new Assignment();
a.throwInstructions();
Scanner userInput = new Scanner(System.in);
String filename = userInput.nextLine();
// String filename = "ab.txt";
// String filename = "abc.txt";
Logic testRun = new Logic(filename);
// //testRun.result();
long end = System.currentTimeMillis();;
System.out.println("Time taken:"+(end - start) + " ms");
}
public void throwInstructions(){
System.out.println("Keep input file in same directory, where the code is");
System.out.println("Please specify the fie name : ");
}
Subclass Logic for processing:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Set;
public class Logic {
private String filename;
private File file;
private List<String> words = new LinkedList<String>();
private Map<String, String> matchedWords = new HashMap();
@Override
public String toString() {
return "Logic [words=" + words + "]";
}
// constructor
public Logic(String filename) {
this.filename = filename;
file = new File(this.filename);
fetchFile();
run();
result();
}
// find the such words and store in map
public void run() {
while (!words.isEmpty()) {
String LongestWord = extractLongestWord(words);
findMatch(LongestWord);
}
}
// find longest word
private String extractLongestWord(List<String> words) {
String longWord;
longWord = words.get(0);
int maxLength = words.get(0).length();
for (int i = 0; i < words.size(); i++) {
if (maxLength < words.get(i).length()) {
maxLength = words.get(i).length();
longWord = words.get(i);
}
}
words.remove(words.indexOf(longWord));
return longWord;
}
// find the match for word in array of sub words
private void findMatch(String LongestWord) {
boolean chunkFound = false;
int chunkCount = 0;
StringBuilder subWords = new StringBuilder();
for (int i = 0; i < words.size(); i++) {
if (LongestWord.indexOf(words.get(i)) != -1) {
subWords.append(words.get(i) + ",");
chunkFound = true;
chunkCount++;
}
}
if (chunkFound) {
matchedWords.put(LongestWord,
"\t" + (subWords.substring(0, subWords.length() - 1))
+ "\t:Subword Count:" + chunkCount);
}
}
// fetch data from file and store in list
public void fetchFile() {
String word;
try {
FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);
while ((word = br.readLine()) != null) {
words.add(word);
}
fr.close();
br.close();
} catch (FileNotFoundException e) {
// e.printStackTrace();
System.out
.println("ERROR: File -> "
+ file.toString()
+ " not Exists,Please check filename or location and try again.");
} catch (IOException e) {
// e.printStackTrace();
System.out.println("ERROR: Problem reading -> " + file.toString()
+ " File, Some problem with file format.");
}
}
// display result
public void result() {
Set set = matchedWords.entrySet();
Iterator i = set.iterator();
System.out.println("WORD:\tWORD-LENGTH:\tSUBWORDS:\tSUBWORDS-COUNT");
while (i.hasNext()) {
Map.Entry me = (Map.Entry) i.next();
System.out.print(me.getKey() + ": ");
System.out.print("\t" + ((String) me.getKey()).length() + ": ");
System.out.println(me.getValue());
}
}
}
This is where my programs lacks and goes into some never ending loop. Complexity of my program is high. To reduce the processing time I need an efficient approach like Binary/merge sort approach which will take least time like O(log n) or O(nlog n).
If someone can help me with this or at least suggestion in which direction I should proceed. Also please suggest me which programming language would be good to implement such text processing tasks in fast way ?
Thanks in advance