Regular Expression to parse Common Name from Distinguished Name

Question

I am attempting to parse (with sed) just First Last from the following DN(s) returned by the DSCL command in OSX terminal bash environment...

CN=First Last,OU=PCS,OU=guests,DC=domain,DC=edu

I have tried multiple regexs from this site and others with questions very close to what I wanted... mainly this question... I have tried following the advice to the best of my ability (I don't necessarily consider myself a newbie...but definitely a newbie to regex..)

DSCL returns a list of DNs, and I would like to only have First Last printed to a text file. I have attempted using sed, but I can't seem to get the correct function. I am open to other commands to parse the output. Every line begins with CN= and then there is a comma between Last and OU=.

Thank you very much for your help!

score 7 · Answer 1 · answered Apr 30 '13 at 13:37

I think all of the regular expression answers provided so far are buggy, insofar as they do not properly handle quoted ',' characters in the common name. For example, consider a distinguishedName like:

CN=Doe\, John,CN=Users,DC=example,DC=local

Better to use a real library able to parse the components of a distinguishedName. If you're looking for something quick on the command line, try piping your DN to a command like this:

    echo "CN=Doe\, John,CN=Users,DC=activedir,DC=local" | python -c 'import ldap; import sys; print ldap.dn.explode_dn(sys.stdin.read().strip(), notypes=1)[0]'

(depends on having the python-ldap library installed). You could cook up something similar with PHP's built-in ldap_explode_dn() function.

score 2 · Answer 2 · answered Jul 20 '12 at 15:48

2

Two cut commands is probably the simplest (although not necessarily the best):

DSCL | cut -d, -f1 | cut -d= -f2

First, split the output from DSCL on commas and print the first field ("CN=First Last"); then split that on equal signs and print the second field.

answered Jul 20 '12 at 15:48

chepner

497,756
71
530
681

+1 because this is by far the easiest way for someone who doesn't grok regex—the OP should immediately understand how it works. – abarnert Jul 20 '12 at 22:01
I used a combination of these cut commands and `sed` when the output was a single line. I sincerely appreciate your help! – Ben Jul 23 '12 at 13:20

score 1 · Accepted Answer · answered Jul 20 '12 at 15:57

1

Using sed:

sed 's/^CN=\([^,]*\).*/\1/' input_file

^           matches start of line 
CN=         literal string match
\([^,]*\)   everything until a comma
.*          rest

answered Jul 20 '12 at 15:57

perreal

94,503
21
155
181

This worked perfectly for when DSCL returned multiple lines. DSCL is rather inconsistent it seems, as it sometimes returns a single line with all the users in a group. Thank you! – Ben Jul 23 '12 at 13:17
4

a CN can contain comma's which have to be escaped, this regex doesn't work on such names. – Air2 Sep 21 '16 at 09:03

score 1 · Answer 4 · answered Jul 20 '12 at 16:37

1

http://www.gnu.org/software/gawk/manual/gawk.html#Field-Separators

awk -v RS=',' -v FS='=' '$1=="CN"{print $2}' foo.txt

answered Jul 20 '12 at 16:37

slitvinov

5,693
20
31

Does `awk` allow to specify escape sequences?, as the comma can appear escaped in the Common Name. – Jaime Hablutzel Apr 12 '19 at 03:41

tombolinux · Answer 5 · 2012-08-01T13:10:41.407

0

I like awk too, so I print the substring from the fourth char:

DSCL | awk '{FS=","}; {print substr($1,4)}' > filterednames.txt

edited Aug 01 '12 at 13:10

answered Jul 22 '12 at 17:24

tombolinux

198
6

Cocowalla · Answer 6 · 2018-04-03T10:27:10.480

This regex will parse a distinguished name, giving name and val a capture groups for each match.

When DN strings contain commas, they are meant to be quoted - this regex correctly handles both quoted and unquotes strings, and also handles escaped quotes in quoted strings:

(?:^|,\s?)(?:(?<name>[A-Z]+)=(?<val>"(?:[^"]|"")+"|[^,]+))+

Here is is nicely formatted:

(?:^|,\s?)
(?:
    (?<name>[A-Z]+)=
    (?<val>"(?:[^"]|"")+"|[^,]+)
)+

Here's a link so you can see it in action: https://regex101.com/r/zfZX3f/2

If you want a regex to get only the CN, then this adapted version will do it:

(?:^|,\s?)(?:CN=(?<val>"(?:[^"]|"")+"|[^,]+))

Regular Expression to parse Common Name from Distinguished Name

6 Answers6