Code golf: find all anagrams

Question

A word is an anagram if the letters in that word can be re-arranged to form a different word.

Task:

The shortest source code by character count to find all sets of anagrams given a word list.
Spaces and new lines should be counted as characters
Use the code ruler

---------10--------20--------30--------40--------50--------60--------70--------80--------90--------100-------110-------120

Input:

a list of words from stdin with each word separated by a new line.

e.g.

A
A's
AOL
AOL's
Aachen
Aachen's
Aaliyah
Aaliyah's
Aaron
Aaron's
Abbas
Abbasid
Abbasid's

Output:

All sets of anagrams, with each set separated by a separate line.

Example run:

./anagram < words
marcos caroms macros
lump's plum's
dewar's wader's
postman tampons
dent tend
macho mocha
stoker's stroke's
hops posh shop
chasity scythia
...

I have a 149 char perl solution which I'll post as soon as a few more people post :)

Have fun!

EDIT: Clarifications

Assume anagrams are case insensitive (i.e. upper and lower case letters are equivalent)
Only sets with more than 1 item should be printed
Each set of anagrams should only be printed once
Each word in an anagram set should only occur once

EDIT2: More Clarifications

If two words differ only in capitalization, they should be collapsed into the same word, and it's up to you to decide which capitalization scheme to use for the collapsed word
sets of words only have to end in a new line, as long as each word is separated in some way, e.g. comma separated, or space separated is valid. I understand some languages have quick array printing methods built in so this should allow you to take advantage of that if it doesn't output space separated arrays.

Sounds like a solution in Prolog could be done pretty short. — JesperE, Apr 02 '10 at 10:35
Two Q.s: 1) If the word list has the same word twice but in different cases, "Azalea" and "azalea", does that count as an anagram (as most solutions below would do), or do they have to be collapsed into one word, and if there are no other anagrams, not output? 2) Does the output format have to match the above exactly? Or would output per line like '["milepost","polemist"]' be acceptable? — MtnViewMark, Apr 02 '10 at 16:36
You should explicitly add the shortest code requirement to the question http://meta.stackexchange.com/questions/24242 — John La Rooy, Apr 02 '10 at 20:14
@mtnviewMark 1. they should be collapsed into 1 word 2. as long as each set end in a new line, so yes that is fine. @gnibbler, thanks, I'll add that — Charles Ma, Apr 03 '10 at 02:37
I suppose I'm sorry I asked :-P None of the submissions I ran now pass, including mine, since given a word list with both "Azalea" and "azalea", they will print a line with those two words as anagrams, rather than collapse them. Fie... — MtnViewMark, Apr 03 '10 at 06:02

score 12 · Accepted Answer · edited Jun 20 '20 at 09:12

Powershell, 104 97 91 86 83 chars

$k=@{};$input|%{$k["$([char[]]$_|%{$_+0}|sort)"]+=@($_)}
$k.Values|?{$_[1]}|%{"$_"}

Update for the new requirement (+8 chars):

To exclude the words that only differ in capitalization, we could just remove the duplicates (case-insensitvely) from the input list, i.e. $input|sort -u where -u stands for -unique. sort is case-insenstive by default:

$k=@{};$input|sort -u|%{$k["$([char[]]$_|%{$_+0}|sort)"]+=@($_)} 
$k.Values|?{$_[1]}|%{"$_"}

Explanation of the `[char[]]$_|%{$_+0}|sort` -part

It's a key for the hashtable entry under which anagrams of a word are stored. My initial solution was: $_.ToLower().ToCharArray()|sort. Then I discovered I didn't need ToLower() for the key, as hashtable lookups are case-insensitive.

[char[]]$_|sort would be ideal, but sorting of the chars for the key needs to be case-insensitive (otherwise Cab and abc would be stored under different keys). Unfortunately, sort is not case-insenstive for chars (only for strings).

What we need is [string[]][char[]]$_|sort, but I found a shorter way of converting each char to string, which is to concat something else to it, in this case an integer 0, hence [char[]]$_|%{$_+0}|sort. This doesn't affect the sorting order, and the actual key ends up being something like: d0 o0 r0 w0. It's not pretty, but it does the job :)

I'm accepting this because 1. it conforms to all the requirements 2. it's the highest voted answer, and 3. I haven't seen much power shell code golfing before :) — Charles Ma, Apr 04 '10 at 09:28

Michael Carman · Answer 2 · 2010-04-02T20:04:00.137

12

Perl, 59 characters

chop,$_{join'',sort split//,lc}.="$_ "for<>;/ ./&&say for%_

Note that this requires Perl 5.10 (for the say function).

edited Apr 02 '10 at 20:04

answered Apr 02 '10 at 16:28

Michael Carman

30,628
10
74
122

Perl 5.10 on my machine says: "Operator or semicolon missing before &say at ana-perl.pl line 1. Ambiguous use of & resolved as operator & at ana-perl.pl line 1." – MtnViewMark Apr 03 '10 at 05:54
@MtnViewMark: To protect backward compatibility the features added to Perl in version 5.10 must be explicitly enabled: `perl -M5.010 ana-perl.pl < wordlist` How that should effect golf scores is debatable. – Michael Carman Apr 04 '10 at 18:33

MtnViewMark · Answer 3 · 2010-04-03T06:49:20.680

5

Haskell, 147 chars

prior sizes: ~~150~~ ~~159~~ chars

import Char
import List
x=sort.map toLower
g&a=g(x a).x
main=interact$unlines.map unwords.filter((>1).length).groupBy((==)&).sortBy(compare&).lines

This version, at 165 chars satisifies the new, clarified rules:

import Char
import List
y=map toLower
x=sort.y
g&f=(.f).g.f
w[_]="";w a=show a++"\n"
main=interact$concatMap(w.nubBy((==)&y)).groupBy((==)&x).sortBy(compare&x).lines

This version handles:

Words in the input that differ only by case should only count as one word
The output needs to be one anagram set per line, but extra punctuation is acceptable

edited Apr 03 '10 at 06:49

answered Apr 02 '10 at 14:20

MtnViewMark

5,120
2
20
29

Just my first attempt -- I'm sure it can be squeezed. – MtnViewMark Apr 02 '10 at 14:21
It's so nice to see that something close to idiomatic, readable Haskell also scores pretty well in code golf! – Thomas Apr 03 '10 at 11:31
Agreed, this is probably my favorite answer :) – Charles Ma Apr 04 '10 at 09:30

Mark Rushakoff · Answer 4 · 2010-04-02T14:57:44.713

4

Ruby, 94 characters

h={};(h[$_.upcase.bytes.sort]||=[])<<$_ while gets&&chomp;h.each{|k,v|puts v.join' 'if v.at 1}

edited Apr 02 '10 at 14:57

answered Apr 02 '10 at 13:33

Mark Rushakoff

249,864
45
407
398

This is a pretty Perl-ish approach, so I expect that somebody will be able to knock off at least 5-10 characters with a Perl solution. – Mark Rushakoff Apr 02 '10 at 14:27

Amber · Answer 5 · 2010-04-03T09:03:52.840

3

Python, 167 characters, includes I/O

import sys
d={}
for l in sys.stdin.readlines():
 l=l[:-1]
 k=''.join(sorted(l)).lower()
 d[k]=d.pop(k,[])+[l]
for k in d:
 if len(d[k])>1: print(' '.join(d[k]))

Without the input code (i.e. if we assume the wordlist already in a list w), it's only 134 characters:

d={}
for l in w:
 l=l[:-1]
 k=''.join(lower(sorted(l)))
 d[k]=d.pop(k,[])+[l]
for k in d:
 if len(d[k])>1: print(' '.join(d[k]))

edited Apr 03 '10 at 09:03

answered Apr 02 '10 at 10:03

Amber

507,862
82
626
550

Remove the space between `:` and `print`, and use semicolons. I think you could use `input()`. – kennytm Apr 02 '10 at 12:56
This produces case-sensitive results. Compare against Dan Andreatta or my solutions. – Mark Rushakoff Apr 02 '10 at 14:25
Fixed, now that case-insensitivity was specified as required. – Amber Apr 02 '10 at 19:24
Python 2.6.1 on my machine claims: "NameError: name 'lower' is not defined" – MtnViewMark Apr 03 '10 at 05:52
Yeah, that was a mistake on my part, I fixed it (should be `.lower()` instead of `lower(...)`). – Amber Apr 03 '10 at 09:04

Dan Andreatta · Answer 6 · 2010-04-02T11:59:08.880

2

AWK - 119

{split(toupper($1),a,"");asort(a);s="";for(i=1;a[i];)s=a[i++]s;x[s]=x[s]$1" "}
END{for(i in x)if(x[i]~/ .* /)print x[i]}

AWK does not have a join function like Python, or it could have been shorter...

~~It assumes uppercase and lowercase as different.~~

edited Apr 02 '10 at 11:59

answered Apr 02 '10 at 10:33

Dan Andreatta

3,611
1
22
15

Will this only print out words which are anagrams? Or will it also print out words with no other anagram on their own lines? – Amber Apr 02 '10 at 10:37
The original version prints everything. Update with the fix. – Dan Andreatta Apr 02 '10 at 10:42
I think this must be for Gnu awk (gawk) only. Standard awk doesn't have the function asort. – MtnViewMark Apr 03 '10 at 05:59

score 2 · Answer 7 · edited Jun 20 '20 at 09:12

2

C++, 542 chars

#include <iostream>
#include <map>
#include <vector>
#include <boost/algorithm/string.hpp>
#define ci const_iterator
int main(){using namespace std;typedef string s;typedef vector<s> vs;vs l;
copy(istream_iterator<s>(cin),istream_iterator<s>(),back_inserter(l));map<s, vs> r;
for (vs::ci i=l.begin(),e=l.end();i!=e;++i){s a=boost::to_lower_copy(*i);
sort(a.begin(),a.end());r[a].push_back(*i);}for (map<s,vs>::ci i=r.begin(),e=r.end();
i!=e;++i)if(i->second.size()>1)*copy(i->second.begin(),i->second.end(),
ostream_iterator<s>(cout," "))="\n";}

edited Jun 20 '20 at 09:12

Community

1
1

answered Apr 02 '10 at 14:57

TC.

4,133
3
31
33

Notes: (1) Actual count as shown is a little higher because I threw in some newlines for readability (main() can be on a single line) (2) Slightly compressed yet readable version: http://codepad.org/JtB1AHrE – TC. Apr 02 '10 at 15:00

codewarrior · Answer 8 · 2010-04-02T10:12:31.440

1

Python, O(n^2)

import sys;
words=sys.stdin.readlines()
def s(x):return sorted(x.lower());
print '\n'.join([''.join([a.replace('\n',' ') for a in words if(s(a)==s(w))]) for w in words])

edited Apr 02 '10 at 10:12

answered Apr 02 '10 at 10:01

codewarrior

2,000
14
14

1

This is... incredibly slow. :P It also outputs each set of anagrams multiple times, equal to the number of anagrams in the set. – Amber Apr 02 '10 at 10:17
1

(Oh, and it also outputs individual words which aren't anagrams on their own line, which doesn't appear to be part of the specified output.) – Amber Apr 02 '10 at 10:29
Why ; at end of line, that is not Python! – Tony Veijalainen Oct 04 '10 at 20:43

Code golf: find all anagrams

Task:

Input:

Output:

EDIT: Clarifications

EDIT2: More Clarifications

8 Answers8

Powershell, 104 97 91 86 83 chars

Update for the new requirement (+8 chars):

Explanation of the `[char[]]$_|%{$_+0}|sort` -part

Perl, 59 characters

Haskell, 147 chars

Ruby, 94 characters

Python, 167 characters, includes I/O

AWK - 119

C++, 542 chars

Python, O(n^2)

Linked

Code golf: find all anagrams

Task:

Input:

Output:

EDIT: Clarifications

EDIT2: More Clarifications

8 Answers8

Powershell, 104 97 91 86 83 chars

Update for the new requirement (+8 chars):

Explanation of the [char[]]$_|%{$_+0}|sort -part

Perl, 59 characters

Haskell, 147 chars

Ruby, 94 characters

Python, 167 characters, includes I/O

AWK - 119

C++, 542 chars

Python, O(n^2)

Linked

Explanation of the `[char[]]$_|%{$_+0}|sort` -part