4

I have certificate information from websites in powershell, they usually look like this

CN=Google Internet Authority G3, O=Google Trust Services, C=US
  1. I need help getting the right regex to only take the information after CN= up to the comma

  2. Second issue is some of the certificates I am getting only have a CN= and therefore there is no comma at the end so it would look like

CN=Google Internet Authority G3

How can I use regex to catch either case?

Here is what I thought would worked and tried :

$cert.Issuer -match "CN=(?<issuer>.*(?=,))"
    Write-Host $Matches['issuer']
>> Google Internet Authority G3, O=Google Trust Services

$cert.Issuer -match "CN=(?<issuer>.*)?,?\s"
    Write-Host $Matches['issuer']
>> Google Internet Authority G3, O=Google Trust Services,

$cert.Issuer -match "CN=(?<issuer>.*),|\s"
    Write-Host $Matches['issuer']
>> Google Internet Authority G3, O=Google Trust Services

So I want to just get

Google Internet Authority G3

whether it has a comma and then more information or does not have a comma and is the end of the string

Thanks!

  • 1
    This is one of the better asked questions I've seen in the regex tag. Well done. – JDB Aug 20 '19 at 21:03

3 Answers3

3

If the text can not contain a comma itself, you could use a negated character class to match any char except a comma. Then match is in the named capturing group issuer

CN=(?<issuer>[^,]+)

If you don't want to match newline, you can extend the negated character class

CN=(?<issuer>[^,\r\n]+)

Explanation

  • CN= Match literally
  • (?<issuer> Named group issuer
  • ) Close named group

Regex demo | Try it online

If the text can contain a comma, you could match any char except a newline non greedy followed by matching either a comma and space or the end of the string.

CN=(?<issuer>.*?)(?:, |$)

Explanation

  • CN= Match literally
  • (?<issuer> Named group issuer
    • .*? Match any char except a newline non greedy (least as possible)
  • ) Close named group
  • (?: Non capturing group
    • , Match comma and space
    • | Or
    • $ Assert the end of the string
  • ) Close named group

Regex demo | Try it online

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 2
    Just to add along same lines, could also use positive lookbehind - (?<=CN=)[^,]+ or (?(?<=CN=)[^,]+). – vs97 Aug 20 '19 at 21:14
  • Awesome! That works great, I am still new to regex and did not know about the negated class. What are the best resources to use besides that regex demo? – Virtual Penman Aug 20 '19 at 23:16
  • Just to make sure I understand, we are saying the string must start with CN= then we are creating a group that will have ? zero or more characters, with the classification(don't know the right word) of , then we use ^ in [ ] to say it cannot have a comma , \r return character or \n newline. What does the plus do? That makes it greedy, so does that make it so we can take as much as possible but only up to a comma, return or newline? – Virtual Penman Aug 21 '19 at 15:58
  • 1
    @VirtualPenman I have added an explanation. The `+` is a [quantifier](https://www.regular-expressions.info/refrepeat.html) which will match 1+ times. The `^` between `[]` is a [negated character class](https://www.regular-expressions.info/charclass.html#negated) and will match any character except what is listed. – The fourth bird Aug 21 '19 at 16:01
  • 1
    Thank you so much, if I could double upvote I would. – Virtual Penman Aug 21 '19 at 21:36
2

In your attempt $cert.Issuer -match "CN=(?<issuer>.*)?,?\s", the problem is using a greedy match .* followed by ,?. The greedy match will just match the remainder of the line after the CN= up until the last \s match. The ,? means there could be one or zero , characters, resulting in the next character matching regardless of it being a ,. Modifying your attempt into the following, would yield the results you want.

$cert.Issuer -match "CN=(?<issuer>.*?),\s"
$matches['issuer']
Google Internet Authority G3

An alternative is using the -split operator for this, which utilizes a regex match. Then simply access the [1] index of the resulting array.

($cert.Issuer -split "CN=|,\s*O=")[1]

Another alternative is using the Match() method from the .NET Regex class, which returns a [System.Text.RegularExpressions.Match] object. You can access the Value property of that object to return the data you need.

[regex]::Match($cert.Issuer,"(?<=CN=).*?(?=,\s*O=)","IgnoreCase").Value

Since there could be , characters in Common Name field, I would be more exact than [^,] or ,\s when matching on characters in that field.

AdminOfThings
  • 23,946
  • 4
  • 17
  • 27
1

As an alternative to a two-step operation (regex matching first, then examination of its results), PowerShell's -replace operator offers a concise solution:

PS> 'CN=Google Internet Authority G3, O=Google Trust Services, C=US' -replace
      '.*\bCN=([^,]+).*', '$1'

Google Internet Authority G3

The key is to have the regex match the entire input string and capture the substring of interest in a capture group (([^,]+)), which in the replacement string can be referenced as $1.

mklement0
  • 382,024
  • 64
  • 607
  • 775