4

I am attempting to parse (with sed) just First Last from the following DN(s) returned by the DSCL command in OSX terminal bash environment...

CN=First Last,OU=PCS,OU=guests,DC=domain,DC=edu

I have tried multiple regexs from this site and others with questions very close to what I wanted... mainly this question... I have tried following the advice to the best of my ability (I don't necessarily consider myself a newbie...but definitely a newbie to regex..)

DSCL returns a list of DNs, and I would like to only have First Last printed to a text file. I have attempted using sed, but I can't seem to get the correct function. I am open to other commands to parse the output. Every line begins with CN= and then there is a comma between Last and OU=.

Thank you very much for your help!

Community
  • 1
  • 1
Ben
  • 63
  • 2
  • 7

6 Answers6

7

I think all of the regular expression answers provided so far are buggy, insofar as they do not properly handle quoted ',' characters in the common name. For example, consider a distinguishedName like:

CN=Doe\, John,CN=Users,DC=example,DC=local

Better to use a real library able to parse the components of a distinguishedName. If you're looking for something quick on the command line, try piping your DN to a command like this:

    echo "CN=Doe\, John,CN=Users,DC=activedir,DC=local" | python -c 'import ldap; import sys; print ldap.dn.explode_dn(sys.stdin.read().strip(), notypes=1)[0]'

(depends on having the python-ldap library installed). You could cook up something similar with PHP's built-in ldap_explode_dn() function.

Josh Kupershmidt
  • 2,540
  • 21
  • 30
2

Two cut commands is probably the simplest (although not necessarily the best):

DSCL | cut -d, -f1 | cut -d= -f2

First, split the output from DSCL on commas and print the first field ("CN=First Last"); then split that on equal signs and print the second field.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • +1 because this is by far the easiest way for someone who doesn't grok regex—the OP should immediately understand how it works. – abarnert Jul 20 '12 at 22:01
  • I used a combination of these cut commands and `sed` when the output was a single line. I sincerely appreciate your help! – Ben Jul 23 '12 at 13:20
1

Using sed:

sed 's/^CN=\([^,]*\).*/\1/' input_file
^           matches start of line 
CN=         literal string match
\([^,]*\)   everything until a comma
.*          rest
perreal
  • 94,503
  • 21
  • 155
  • 181
  • This worked perfectly for when DSCL returned multiple lines. DSCL is rather inconsistent it seems, as it sometimes returns a single line with all the users in a group. Thank you! – Ben Jul 23 '12 at 13:17
  • 4
    a CN can contain comma's which have to be escaped, this regex doesn't work on such names. – Air2 Sep 21 '16 at 09:03
1

http://www.gnu.org/software/gawk/manual/gawk.html#Field-Separators

awk -v RS=',' -v FS='=' '$1=="CN"{print $2}' foo.txt
slitvinov
  • 5,693
  • 20
  • 31
0

I like awk too, so I print the substring from the fourth char:

DSCL | awk '{FS=","}; {print substr($1,4)}' > filterednames.txt

tombolinux
  • 198
  • 6
0

This regex will parse a distinguished name, giving name and val a capture groups for each match.

When DN strings contain commas, they are meant to be quoted - this regex correctly handles both quoted and unquotes strings, and also handles escaped quotes in quoted strings:

(?:^|,\s?)(?:(?<name>[A-Z]+)=(?<val>"(?:[^"]|"")+"|[^,]+))+

Here is is nicely formatted:

(?:^|,\s?)
(?:
    (?<name>[A-Z]+)=
    (?<val>"(?:[^"]|"")+"|[^,]+)
)+

Here's a link so you can see it in action: https://regex101.com/r/zfZX3f/2

If you want a regex to get only the CN, then this adapted version will do it:

(?:^|,\s?)(?:CN=(?<val>"(?:[^"]|"")+"|[^,]+))

Cocowalla
  • 13,822
  • 6
  • 66
  • 112