Translate words in a string using BufferedReader (Java)

Question

I've been working on this for a few days now and I just can't make any headway. I've tried using Scanner and BufferedReader and had no luck.

Basically, I have a working method (shortenWord) that takes a String and shortens it according to a text file formatted like this:

hello,lo
any,ne
anyone,ne1
thanks,thx

It also accounts for punctuation so 'hello?' becomes 'lo?' etc.

I need to be able to read in a String and translate each word individually, so "hello? any anyone thanks!" will become "lo? ne ne1 thx!", basically using the method I already have on each word in the String. The code I have will translate the first word but then does nothing to the rest. I think it's something to do with how my BufferedReader is working.

import java.io.*;

public class Shortener {
    private FileReader in ;
    /*
     * Default constructor that will load a default abbreviations text file.
     */
    public Shortener() {
        try {
            in = new FileReader( "abbreviations.txt" );
        }       

        catch ( Exception e ) {
            System.out.println( e );
        }
    }

    public String shortenWord( String inWord ) {
        String punc = new String(",?.!;") ;
        char finalchar = inWord.charAt(inWord.length()-1) ;
        String outWord = new String() ;
        BufferedReader abrv = new BufferedReader(in) ;

            // ends in punctuation
            if (punc.indexOf(finalchar) != -1 ) {
                String sub = inWord.substring(0, inWord.length()-1) ;
                outWord = sub + finalchar ;


            try {
                String line;
                while ( (line = abrv.readLine()) != null ) {
                    String[] lineArray = line.split(",") ;
                        if ( line.contains(sub) ) {
                            outWord = lineArray[1] + finalchar ;
                            }
                        }
                    }

            catch (IOException e) {
                System.out.println(e) ;
                }
            }

            // no punctuation
            else {
                outWord = inWord ;

                try {
                String line;

                    while( (line = abrv.readLine()) != null) {
                        String[] lineArray = line.split(",") ;
                            if ( line.contains(inWord) ) {
                                outWord = lineArray[1] ;
                            }
                        }
                    }

                catch (IOException ioe) {
                   System.out.println(ioe) ; 
                }
            }

        return outWord;
    }

    public void shortenMessage( String inMessage ) {
         String[] messageArray = inMessage.split("\\s+") ;
         for (String word : messageArray) {
            System.out.println(shortenWord(word));
        }
    }
}

Any help, or even a nudge in the right direction would be so much appreciated.

Edit: I've tried closing the BufferedReader at the end of the shortenWord method and it just results in me getting an error on every word in the String after the first one saying that the BufferedReader is closed.

On an unrelated side note: I imagine word shorteners like these are any English teacher's nightmare fuel. — Ceiling Gecko, Apr 02 '15 at 10:23
There is no point in reading the file again and again for each word, and you're not actually doing it anyway because once you have reached the end of file, if you don't reopen it or rewind it, it will stay at the end of file. A better logic would be to open the file, read a line, and then apply the replacements to each such line. — RealSkeptic, Apr 02 '15 at 10:25
Or read the "translations" to a `Map`. Also why on earth are you using `String punc = new String(",?.!;") ;` instead of `String punc = ",?.!;";`? — fabian, Apr 02 '15 at 10:33
Ah okay, I see what you're saying. I was trying to implement it this way because I assumed it would be better to use the shortenWord method to save on writing out logic again. I'm going to try it your way now and will report back! Edit: I'm still new to Java, sometimes I make silly mistakes like that String thing and if they work I don't usually recognise the 'bad' code enough to change it. (I have fixed it now though, thank you!) — ectaylor, Apr 02 '15 at 10:33
I can't see why it won't work, you have a debugger for this purpose. All I can tell is you're making in much harder than it is. You repeat **exact** code for word ending with punctuation and for one without. It's pointless. Strip down you word of punctuations, remember them. Word without punctuations has to be checked against any of your "dictionary" and replaced if necessary. Then add punctuations previously removed. — zubergu, Apr 02 '15 at 10:38
Thank you @zubergu, I can see how that will cut down on my code. Will implement now! — ectaylor, Apr 02 '15 at 10:45
Also, consider reading your abbreviations to a HashMap holding original word as a key and abbreviation as a value. Reading file is a time consuming process, looping over every each element is also time consuming. For few words it doesn't matter, but you'll appreciate the speed up when you feed it few MiB book. — zubergu, Apr 02 '15 at 10:48
@ectaylor please revert your lastest edit and bring the actual question back. This site is for sharing problems and solutions. — atomman, Apr 02 '15 at 11:52
Don't vandalize your own questions. I have reverted your "edit". — Stephen C, Apr 02 '15 at 12:06

score 3 · Answer 1 · answered Apr 02 '15 at 11:25

So I took at look at this. First of all, if you have the option to change the format of your textfile I would change it to something like this (or XML):

 key1=value1
 key2=value2

By doing this you could later use java's Properties.load(Reader). This would remove the need for any manual parsing of the file.'

If by any change you don't have the option to change the format then you'll have to parse it yourself. Something like the code below would do that, and put the results into a Map called shortningRules which could then be used later.

private void parseInput(FileReader reader) {
    try (BufferedReader br = new BufferedReader(reader)) {
        String line;
        while ((line = br.readLine()) != null) {
            String[] lineComponents = line.split(",");
            this.shortningRules.put(lineComponents[0], lineComponents[1]);
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
}

When it comes to actually shortening a message I would probably opt for a regex approach, e.g \\bKEY\\b where key is word you want shortened. \\b is a anchor in regex and symbolizes a word boundery which means it will not match spaces or punctuation. The whole code for doing the shortening would then become something like this:

public void shortenMessage(String message) {
    for (Entry<String, String> entry : shortningRules.entrySet()) {
        message = message.replaceAll("\\b" + entry.getKey() + "\\b", entry.getValue());
    }
    System.out.println(message); //This should probably be a return statement instead of a sysout.
}

Putting it all together will give you something this, here I've added a main for testing purposes.

I like the `replaceAll`, shortens not only words efficiently, but code aswell :P +1 for properties — Ian2thedv, Apr 02 '15 at 11:50
I would like to implement this but I have more than 1 translation (eg other languages). I can change the text file to use `key1=value1` instead of another separator. Can you please help me with my post? [http://stackoverflow.com/q/40575394/1919069](http://stackoverflow.com/q/40575394/1919069) Thanks. — euler, Nov 23 '16 at 01:12

Ian2thedv · Accepted Answer · 2015-04-02T11:48:56.727

I think you can have a simpler solution using a HashMap. Read all the abbreviations into the map when the Shortener object is created, and just reference it once you have a word. The word will be the key and the abbreviation the value. Like this:

public class Shortener {

    private FileReader in;
    //the map
    private HashMap<String, String> abbreviations;

    /*
     * Default constructor that will load a default abbreviations text file.
     */
    public Shortener() {
        //initialize the map
        this.abbreviations = new HashMap<>();
        try {
            in = new FileReader("abbreviations.txt" );
            BufferedReader abrv = new BufferedReader(in) ;
            String line;
            while ((line = abrv.readLine()) != null) {
                String [] abv = line.split(",");
                //If there is not two items in the file, the file is malformed
                if (abv.length != 2) {
                    throw new IllegalArgumentException("Malformed abbreviation file");
                }
                //populate the map with the word as key and abbreviation as value
                abbreviations.put(abv[0], abv[1]);
            }
        }       

        catch ( Exception e ) {
            System.out.println( e );
        }
    }

    public String shortenWord( String inWord ) {
        String punc = new String(",?.!;") ;
        char finalchar = inWord.charAt(inWord.length()-1) ;

        // ends in punctuation
        if (punc.indexOf(finalchar) != -1) {
            String sub = inWord.substring(0, inWord.length() - 1);

            //Reference map
            String abv = abbreviations.get(sub);
            if (abv == null)
                return inWord;
            return new StringBuilder(abv).append(finalchar).toString();
        }

        // no punctuation
        else {
            //Reference map
            String abv = abbreviations.get(inWord);
            if (abv == null)
                return inWord;
            return abv;
        }
    }

    public void shortenMessage( String inMessage ) {
         String[] messageArray = inMessage.split("\\s+") ;
         for (String word : messageArray) {
            System.out.println(shortenWord(word));
        }
    }

    public static void main (String [] args) {
        Shortener s = new Shortener();
        s.shortenMessage("hello? any anyone thanks!");
    }
}

Output:

lo?
ne
ne1
thx!

Edit:

From atommans answer, you can basically remove the shortenWord method, by modifying the shortenMessage method like this:

public void shortenMessage(String inMessage) {
     for (Entry<String, String> entry:this.abbreviations.entrySet()) 
         inMessage = inMessage.replaceAll(entry.getKey(), entry.getValue());

     System.out.println(inMessage);
}

Wow! Thank you so much, this is perfect. I've never used HashMaps before but this definitely looks like the simplest implementation of my problem :) — ectaylor, Apr 02 '15 at 11:09
No problem, I would suggest you read up on them a bit. You'll find they can simplify many solutions but also not that necessary for some. http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html — Ian2thedv, Apr 02 '15 at 11:16
Take a look at the [Properties](http://docs.oracle.com/javase/7/docs/api/java/util/Properties.html) as well. That way you dont have to parse the file yourself. — atomman, Apr 02 '15 at 11:26

Translate words in a string using BufferedReader (Java)

2 Answers2