6

I have a pool of characters and I want to match all the words which are anagrams of those chars or of a subset of those chars using a regular expression.

Example: given the string "ACNE" the regex should give me these results:

  • ACNE [T]
  • CENA [T]
  • CAN [T]
  • CAAN [F]
  • CANEN [F]

I've tried this solution /b[acne]{1,4}/b but it accepts multiple repetitions of single chars. What can I do to take each char at most one time?

HghlnDR
  • 71
  • 1
  • 1
  • 5
  • 4
    Regex is not the right tool for this. I suggest you to use String Library of whatever language you are doing this in. – Rohit Jain Jan 28 '13 at 12:07
  • Check this answer, if you want to hammer a nail with screwdriver: http://stackoverflow.com/a/14383513/1400768 Or you can check other answers. It is probably duplicate of the question I linked to, http://stackoverflow.com/questions/14383119/check-if-string-is-subset-of-a-bunch-of-characters-regex/14383513 – nhahtdh Jan 28 '13 at 12:57

2 Answers2

9

The sub-anagrams of the word "acne" are the words that

  • consist only of the letters acne
  • do not contain a more than once
  • do not contain c more than once
  • do not contain n more than once
  • do not contain e more than once

Compiling this into a regex:

^(?!.*a.*a)(?!.*c.*c)(?!.*n.*n)(?!.*e.*e)[acne]*$

Test: regexpal

Alternatively, since "acne" does not contain any letter more than once, the sub-anagrams of the word "acne" are the words that

  • consist only of the letters acne
  • do not contain any letter more than once.

Compiling this into a regex:

^(?!.*(.).*\1)[acne]*$

Test: regexpal

Note: the sub-anagrams of the word "magmoid" can be matched as

^(?!.*([agoid]).*\1)(?!(.*m){3})[magoid]*$

(do not contain any of agoid more than once, and do not contain m more than twice)

John Dvorak
  • 26,799
  • 13
  • 69
  • 83
  • Note that this is only possible if you assume the number of occurrences of the characters are equal. It can be modified a bit to accommodate the case above. – nhahtdh Jan 28 '13 at 13:21
  • @nhahtdh 'since "acne" does not contain any letter more than once'; there is no such restriction for the first approach. – John Dvorak Jan 28 '13 at 13:24
  • I'm talking about second approach. Yes, there is no restriction on the first approach. – nhahtdh Jan 28 '13 at 13:25
  • @useless the point of the regex is to show, by example, how to build a regex for any specific word. The second regex shows how to do that for strings with repeated letters. It is expected that the reader can extrapolate from the information given. Can you? – John Dvorak Oct 05 '13 at 21:43
  • but, how about letter repetition, how can you use that regex to match for example: doom as an anagram of mood? – useless Oct 05 '13 at 21:49
  • @useless the second regex (and description) shows how to handle letters with repetition. Should I elaborate on how to optimise the regex if multiple letters have the same number of max. repetitions, or how to extrapolate from the second example? – John Dvorak Oct 05 '13 at 22:07
  • @useless extrapolating from the magmoid example, `mood` would be `^(?!.*([md]).*\1)(?!(.*o){3})[mod]*$` – John Dvorak Oct 05 '13 at 22:10
0

CODE TO FIND NUMBER OF ANAGRAMS OF A WORD IN A GIVEN STRING USING REGULAR REXPRESSION

Fork the below repository for java, DataStructure, Algorithms, and Company Interview Questions Practices. Please feel free to contribute to the Repository

https://github.com/arpans2112/techsqually-java8-best-practices/blob/master/src/com/techsqually/java/library/util/regularexpression/anagramStrings.java

package com.techsqually.java.library.util.regularexpression;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class anagramStrings {


    public static void main(String[] args) {

       int count = findAnagramsInAGivenStrings("arpan","Hi arpan Aarpn we are testing rapan rranp anagram");
        System.out.println(count);
    }


    /**
     * <p> Use to find the number of anagrams of a word in a Given String</p>
     * @param : word : is the word for which you want to find the anagrams
     * @param : givenString : is the string in which you want to find the anagrams of word given
     * @return : total number of anagrams of the word passed
     *  
     *  all words in which each character count is same but their order can be different 
     *  e.g arpan and rapan are anagrams 
     *  
     * @output of above given example is 3, "arpan" , "Aarpn" and rapan are anagrams of arpan
     * */
    public static int findAnagramsInAGivenStrings(String word, String givenString){

        word = word.toLowerCase();
        givenString = givenString.toLowerCase();
        HashMap<String,Integer> numberOfAnnagrams = new HashMap<>();
       Matcher matcher = Pattern.compile("[" + word + "]{" + word.length() + "}").matcher(givenString);

       int count = 0;
        while (matcher.find()){

                 char[] matchWordArray = matcher.group().toCharArray();
                 char[] givenWordArray = word.toCharArray();
            Arrays.sort(matchWordArray);
            Arrays.sort(givenWordArray);

            if (Arrays.equals(matchWordArray,givenWordArray)) count++;
        }

        return count;
    }
}
Arpan Saini
  • 4,623
  • 1
  • 42
  • 50