0

I am trying to do a pattern match search for anything that has numbers for employeeID: xxxxxx and bring all entries that matches. But it doesn't seem to work as expected. And vice versa where bring all entries that doesn't have a number in the employeeID field.

My testfile

dn: CN=User One,OU=Disabled,OU=People,DC=training,DC=example,DC=
 com  
userAccountControl: 514  
employeeID: user1  
comment: HIRED  
sAMAccountName: user1  

dn: CN=Given-iPad01,OU=Room,DC=training,DC=example,DC=com  
userAccountControl: 544  
employeeID: Given-iPad01  
sAMAccountName: Given-iPad01  
lastLogonTimestamp: 130678281934843750    

dn: CN=User Two,OU=Admins,DC=training,DC=example,DC=com  
userAccountControl: 512  
employeeID:: IDE2NzQwODg=  
sAMAccountName: user2  
lastLogonTimestamp: 131685330348725308    

dn: CN=Test User2,OU=2012,OU=People,DC=training,DC=example
 ,DC=com  
userAccountControl: 512  
employeeID: testuser2  
sAMAccountName: testuser2  
lastLogonTimestamp: 131328157284117480    

dn: CN=User Three,OU=People,DC=training,DC=example,DC=com  
userAccountControl: 512  
employeeID: 123456  
comment: HIRED  
sAMAccountName: user3  
lastLogonTimestamp: 131679287880585713   

My expected output was to bring all entires except the one that has employeeID: testuser, but my result came only with entry where I have employeeID: 123456.
Below is what i was looking for

dn: CN=User One,OU=Disabled,OU=People,DC=training,DC=example,DC=com  
userAccountControl: 514  
employeeID: user1  
comment: HIRED  
sAMAccountName: user1    

dn: CN=User Two,OU=Admins,DC=training,DC=example,DC=com  
userAccountControl: 512  
employeeID:: IDE2NzQwODg=  
sAMAccountName: user2  
lastLogonTimestamp: 131685330348725308    

dn: CN=User Three,OU=People,DC=training,DC=example,DC=com  
userAccountControl: 512  
employeeID: 123456  
comment: HIRED  
sAMAccountName: user3  
lastLogonTimestamp: 131679287880585713    

dn: CN=Test User2,OU=2012,OU=People,DC=training,DC=example,DC=com  
userAccountControl: 512  
employeeID: testuser  
sAMAccountName: testuser  
lastLogonTimestamp: 131328157284117480    

dn: CN=Given-iPad01,OU=Rooms,DC=training,DC=example,DC=com  
userAccountControl: 544  
employeeID: Given-iPad01  
sAMAccountName: Given-iPad01  
lastLogonTimestamp: 130678281934843750    

Below is what I tried: To pull that contains number anywhere on the employeeID entry
perl -000 -ne 'print if /employeeID: [0-9]/' testfile

Not to pull those contains number anywhere on the employeeID entry perl -000 -ne 'print if !/employeeID: [0-9]/i' testfile

BBJinu
  • 77
  • 6
  • This only finds employee numbers that start with a digit. Add a `+`: `[0-9]+` or `\d+` and a word boundary marker: `\d+\b` or newline `\d+\n` – Robert Apr 20 '18 at 16:39
  • 1
    It looks like you're pasting real employee data. If I were you I would edit that out. Idk where this is from but usually it's frowned upon to post real data like that on the internet. – kingsfoil Apr 20 '18 at 16:45
  • @Robert is this what you meant? perl -000 -ne 'print if /employeeID+: [0-9]+/i' testfile – BBJinu Apr 20 '18 at 16:50
  • 1
    @0112 this is not a real data, my lab and even I modified the entires before I pasted..thanks – BBJinu Apr 20 '18 at 16:52
  • It's not clear what you want. Records where employee ID contains a digit? – choroba Apr 20 '18 at 17:04
  • @choroba Yes correct, employeeID contains a digit, but when pulling it should pull all the attributes that is associated to it, like dn, userAccountControl, etc. – BBJinu Apr 20 '18 at 17:07
  • This looks like an LDIF file. Why don't you parse it as such? – Matt Jacob Apr 20 '18 at 17:15
  • 1
    The result you say you're looking for has 'testuser' and looks like the same as your input? – zzxyz Apr 20 '18 at 17:27

2 Answers2

2

The digit might be preceded by anything but newline. . matches anything but newline, .* means there can be 0 or more such characters. /m is needed to make ^ match a start of line instead of start of string.

perl -000 -ne 'print if /^employeeID: .*[0-9]/m' -- file
choroba
  • 231,213
  • 25
  • 204
  • 289
0

Perl is certainly well suited to this task, but it might be easier to:

grep -E -B 2 -A 3 'employeeID:\s*.*[0-9]+.*' ./testfile

The reason you don't see the expected output from this one liner is that you're telling perl to only match things that are [0-9]. You need to quantify this in the expression (with + or *), as well as match non-numerical characters (e.g. .*).

It would be good to read up on how regular expressions work. Here's an environment where you can play around with this particular expression: https://regexr.com/3o9av

kingsfoil
  • 3,795
  • 7
  • 32
  • 56
  • If I understand correctly you want a regex that matches all entries in your file where the employee ID is a digit. Have I misunderstood? That expression works perfectly for me. – kingsfoil Apr 20 '18 at 17:18
  • @Binish I notice in your expected output you now list several entries where the employee ID is non-numerical. In your question you state that you are _"trying to do a pattern match search for anything that has numbers for employeeID: xxxxxx and bring all entries that matches."_ I think you may need to clarify a bit more before we can help you. – kingsfoil Apr 20 '18 at 17:21
  • The reason your regex is only matching one entry is because there is only one entry where the employeeID is numerical. (In the data you've provided here. – kingsfoil Apr 20 '18 at 17:22
  • Yes correct to be more clear, where the employee ID contains a digit, example it can be abcd1, 123abc, aa11bb. But it should not pull any entry that has no digits example testuser or abcd=== – BBJinu Apr 20 '18 at 17:22
  • 1
    That's a bit different. Would you please edit your original question to clarify that? – kingsfoil Apr 20 '18 at 17:23
  • @Binish I've updated my answer to reflect that information – kingsfoil Apr 20 '18 at 17:27
  • it works thanks. but got a line added with -- before the line dn: CN=Given-iPad01. – BBJinu Apr 20 '18 at 17:39
  • If you're running Linux use the `--no-group-seperator` flag with the expression. If you're running OS X or something else that doesn't support that flag pipe the output of the original statement to something like `grep -v -- "^--$"` So the final statement ends up being `grep -E -B 2 -A 3 'employeeID:\s*.*[0-9]+.*' ./testfile | grep -v -- "^--$"` . (source: https://stackoverflow.com/a/8840902/1596460) – kingsfoil Apr 20 '18 at 17:46
  • The one that you posted worked https://regexr.com/3o9av . Thanks so much, I used this reg expression /employeeID:\s*.*[0-9]+.*/i' – BBJinu Apr 20 '18 at 17:50
  • You're welcome. I would appreciate being marked as the answer if this helped you solve your problem. – kingsfoil Apr 20 '18 at 17:52