Regex - find anagrams and sub-anagrams

Question

I have a pool of characters and I want to match all the words which are anagrams of those chars or of a subset of those chars using a regular expression.

Example: given the string "ACNE" the regex should give me these results:

ACNE [T]
CENA [T]
CAN [T]
CAAN [F]
CANEN [F]

I've tried this solution /b[acne]{1,4}/b but it accepts multiple repetitions of single chars. What can I do to take each char at most one time?

Regex is not the right tool for this. I suggest you to use String Library of whatever language you are doing this in. — Rohit Jain, Jan 28 '13 at 12:07
Check this answer, if you want to hammer a nail with screwdriver: http://stackoverflow.com/a/14383513/1400768 Or you can check other answers. It is probably duplicate of the question I linked to, http://stackoverflow.com/questions/14383119/check-if-string-is-subset-of-a-bunch-of-characters-regex/14383513 — nhahtdh, Jan 28 '13 at 12:57

John Dvorak · Accepted Answer · 2013-01-28T13:30:17.000

9

The sub-anagrams of the word "acne" are the words that

consist only of the letters acne
do not contain a more than once
do not contain c more than once
do not contain n more than once
do not contain e more than once

Compiling this into a regex:

^(?!.*a.*a)(?!.*c.*c)(?!.*n.*n)(?!.*e.*e)[acne]*$

Test: regexpal

Alternatively, since "acne" does not contain any letter more than once, the sub-anagrams of the word "acne" are the words that

consist only of the letters acne
do not contain any letter more than once.

Compiling this into a regex:

^(?!.*(.).*\1)[acne]*$

Test: regexpal

Note: the sub-anagrams of the word "magmoid" can be matched as

^(?!.*([agoid]).*\1)(?!(.*m){3})[magoid]*$

(do not contain any of agoid more than once, and do not contain m more than twice)

edited Jan 28 '13 at 13:30

answered Jan 28 '13 at 12:59

John Dvorak

26,799
13
69
83

Note that this is only possible if you assume the number of occurrences of the characters are equal. It can be modified a bit to accommodate the case above. – nhahtdh Jan 28 '13 at 13:21
@nhahtdh 'since "acne" does not contain any letter more than once'; there is no such restriction for the first approach. – John Dvorak Jan 28 '13 at 13:24
I'm talking about second approach. Yes, there is no restriction on the first approach. – nhahtdh Jan 28 '13 at 13:25
@useless the point of the regex is to show, by example, how to build a regex for any specific word. The second regex shows how to do that for strings with repeated letters. It is expected that the reader can extrapolate from the information given. Can you? – John Dvorak Oct 05 '13 at 21:43
but, how about letter repetition, how can you use that regex to match for example: doom as an anagram of mood? – useless Oct 05 '13 at 21:49
@useless the second regex (and description) shows how to handle letters with repetition. Should I elaborate on how to optimise the regex if multiple letters have the same number of max. repetitions, or how to extrapolate from the second example? – John Dvorak Oct 05 '13 at 22:07
@useless extrapolating from the magmoid example, `mood` would be `^(?!.*([md]).*\1)(?!(.*o){3})[mod]*$` – John Dvorak Oct 05 '13 at 22:10

score 0 · Answer 2 · answered Sep 23 '19 at 07:50

CODE TO FIND NUMBER OF ANAGRAMS OF A WORD IN A GIVEN STRING USING REGULAR REXPRESSION

Fork the below repository for java, DataStructure, Algorithms, and Company Interview Questions Practices. Please feel free to contribute to the Repository

https://github.com/arpans2112/techsqually-java8-best-practices/blob/master/src/com/techsqually/java/library/util/regularexpression/anagramStrings.java

package com.techsqually.java.library.util.regularexpression;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class anagramStrings {


    public static void main(String[] args) {

       int count = findAnagramsInAGivenStrings("arpan","Hi arpan Aarpn we are testing rapan rranp anagram");
        System.out.println(count);
    }


    /**
     * <p> Use to find the number of anagrams of a word in a Given String</p>
     * @param : word : is the word for which you want to find the anagrams
     * @param : givenString : is the string in which you want to find the anagrams of word given
     * @return : total number of anagrams of the word passed
     *  
     *  all words in which each character count is same but their order can be different 
     *  e.g arpan and rapan are anagrams 
     *  
     * @output of above given example is 3, "arpan" , "Aarpn" and rapan are anagrams of arpan
     * */
    public static int findAnagramsInAGivenStrings(String word, String givenString){

        word = word.toLowerCase();
        givenString = givenString.toLowerCase();
        HashMap<String,Integer> numberOfAnnagrams = new HashMap<>();
       Matcher matcher = Pattern.compile("[" + word + "]{" + word.length() + "}").matcher(givenString);

       int count = 0;
        while (matcher.find()){

                 char[] matchWordArray = matcher.group().toCharArray();
                 char[] givenWordArray = word.toCharArray();
            Arrays.sort(matchWordArray);
            Arrays.sort(givenWordArray);

            if (Arrays.equals(matchWordArray,givenWordArray)) count++;
        }

        return count;
    }
}

Regex - find anagrams and sub-anagrams

2 Answers2

Linked