Find and Replace in array lists with help of a map

Question

I'm having 2 lists like below.

List<String> list1 = Arrays.asList("I'm a cat", "dog", "There's an elephant and I'm seeing", "we're five");

List<String> list2 = Arrays.asList("I'm", "There's", "we're");

and a hash map as below.

"I'm": "I am"
"we're": "we are"
"There's": "there is"

Here I need to update my list1 with the dictionary values. i.e. it should be as

"I am a cat", "dog", "There is an elephant and I am seeing it", "we are five"

Here my main problem is the list1 that I've provided has close to 80K sentences and the map is of 4k values. Here I'm able to generate all the list1 , list2 and map. but since it is very huge I'm unable to find an efficient way of doing a find and replace.

I thought of using commons StringUtils.replaceAll() by converting my lists into arrays, but again the issue is I'll need to loop through all the 80k items * 4k times (even more as they are statements rather than single word strings).

How can I do it?

..but, the question is, what is the source of data in the list? did you type it in the code, or you fill it from a file or database? — Youcef LAIDANI, May 13 '18 at 12:42
if the query patterns stay the same and texts are different, it makes sense to construct an FSM based on query strings (in your case - set of map keys), that will optimize a pattern search, but you still will have to process all 80K entries one by one — mangusta, May 13 '18 at 12:48
Can't you get the list of string into a single string variable with some `delimiter` & apply `StringUtils.replaceAll()` . And at the end with the delimiter you split out the string into string array. So you only need to loop through the `Map` you have. — Abid Khan, May 13 '18 at 12:57
Hi All apologies for the delayed response. I've my data in an excel and I'm using poi and building the lists and map — Rakesh, May 13 '18 at 13:44

jspcal · Answer 1 · 2018-05-13T13:52:20.700

You can perform the substitutions in a single pass. Arrange for the text to be stored as a single string so that you can operate on the input in bulk. You can use an appropriate delimiter so that you can separate the strings when the translation is done.

Prepare a regular expression (or generate a state machine based tokenizer using a tool like JFlex) that matches any of the strings to be replaced (the keys in your map). Then iterate over each match and perform the substitution.

Here's an example of using Pattern to perform the replacements in bulk:

import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.stream.Collectors;

public class Substituter {
    public static void main(String args[]) {
        // Read the input into a string (or combine the inputs if needed)

        List<String> strings = Arrays.asList("I'm a cat", "dog", "There's an elephant and I'm seeing", "we're five");

        // String replacements

        Map<String, String> replacements = new HashMap<>();
        replacements.put("I'm", "I am");
        replacements.put("we're", "we are");
        replacements.put("There's", "there is");

        // Build the regular expression by concatenating the strings to be replaced into an or expression (|)

        Pattern pattern = Pattern.compile(replacements.keySet().stream().map(Pattern::quote).collect(Collectors.joining("|")));

        // Perform the substitutions

        Matcher m = pattern.matcher(String.join("~", strings));
        StringBuffer newText = new StringBuffer();

        while (m.find()) {
            m.appendReplacement(newText, replacements.get(m.group()));
        }

        m.appendTail(newText);

        // Split the output into separate strings if needed

        List<String> newStrings = Arrays.asList(newText.toString().split("~"));
        System.out.println("Original strings: " + strings);
        System.out.println("New strings: " + newStrings);
    }
}

Output:

Original strings: [I'm a cat, dog, There's an elephant and I'm seeing, we're five]
New strings: [I am a cat, dog, there is an elephant and I am seeing, we are five]

score 0 · Answer 2 · answered May 13 '18 at 13:17

0

Here is another version, I found this post and modified the program a little bit...

Map <String, String> tokenMap = new HashMap <> ();
tokenMap.put("I'm", "I am");
tokenMap.put("We're", "We are");

String [] array = {"I'm at home" , "We're playing football"};

String content = Arrays.toString(array).substring(1, Arrays.toString(array).length() - 1);
String regex = StringUtils.join( tokenMap.keySet(), "|");
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(content);

StringBuffer buffer = new StringBuffer();

while(matcher.find())
{
    matcher.appendReplacement(buffer,  tokenMap.get(matcher.group(0)));
}

matcher.appendTail(buffer);
array = buffer.toString().split(", ");

I don't know how efficient it is, I tested it only with few elements...

answered May 13 '18 at 13:17

0x1C1B

1,204
11
40

`StringUtils` is an external library. Java 8 has `String.join` if I'm not mistaken. – Morgan May 13 '18 at 13:24
@Morgan No you're right, alternate you could use `String.join("|", tokenMap.keySet());` – 0x1C1B May 13 '18 at 13:30
Also use `StringBuilder` instead of `StringBuffer`. – Morgan May 13 '18 at 13:32
@Morgan If I'm informed right, there is support just since Java 9 for `Matcher.appendReplacement(...)` combined with `StringBuilder`... – 0x1C1B May 13 '18 at 13:37

Youcef LAIDANI · Answer 3 · 2018-05-13T13:34:24.960

0

I would like to use Parallel Stream from Java 8+, combining with Apache Commons - Lang which provide a good functionality replaceEach(String text, String[] searchList, String[] replacementList) :

List<String> list = ...
Map<String, String> mapReplacement = ...
//replaceEach take a String String array of search words, String array of replacement
String[] keys = mapReplacement.keySet().toArray(new String[map.size()]);
String[] values = mapReplacement.keySet().toArray(new String[map.size()]);

list = list.parallelStream()
        .map(element -> StringUtils.replaceEach(element, keys, values))
        .collect(Collectors.toList());

Note

But It still unclear from where you get this data, if from database then its better to solve in database, instead in java code, personally I don't like this huge data in the list and the map.

edited May 13 '18 at 13:34

answered May 13 '18 at 13:28

Youcef LAIDANI

55,661
15
90
140

Hi @YCF_L,my data is in an excel. From there I'm pulling and creating lists and map using poi – Rakesh May 13 '18 at 13:47
@Rakesh Now it is another story, can you edit your question and mention this information it is really important – Youcef LAIDANI May 13 '18 at 13:51
surething. I'll try this and if it doesn't work out, I'll post it as an another one. Thanks for the suggestion. – Rakesh May 13 '18 at 14:15
@Rakesh In your case, it better to read line by line from your fine, and edit it with the way I already provide. – Youcef LAIDANI May 13 '18 at 14:17

Find and Replace in array lists with help of a map

3 Answers3