2

I'm writing a script in Powershell to take a list of email recipients in the general format """Name One"" <email one>, ""Name Two"" <email two>" and output a list of recipients that can be parsed as a CSV (outside Powershell). The list comes from a cell in a CSV that I cannot control the format of. I can't parse the list itself as a CSV directly: because there's no specific number of recipients per email, I can't specify a header, and attempting to parse it as a CSV by removing the initial and ending quote results in improperly quoted names (and potential problems when the name contains a comma).

So far, I've realized that I can load the string as a CSV and that will turn the above example into "Name One" <email one>, "Name Two", <email two>", and this would be perfect if I could split on the comma. Unfortunately I cannot, since sometimes people's names show up as "Three, Name" and I can't split in the middle of a person's name. What I need is some way I can replace the comma outside the quotes with some other character (or series of characters). If I can do that, then the rest of my process will work flawlessly; as it is, I'm running into issues because the quickly thrown together regex I was using doesn't properly catch recipients without an email address.

The only idea I have at the moment is to write a parser that steps through each character, keeps track of how many quotes it comes across, and replaces the comma when that number is even. I'd rather not do this; there's a lot of data and I'd rather something faster if possible. I'll implement it that way for now, but I'm hopeful I can get a better solution here.

Edit: To clarify on the format: most emails are in the format ""Something"" <addr@example.org> but not all are. The most common format other than that is just a name (and nothing in angle brackets), but outside of that most of the emails do follow that format of something in double quotes followed by something in angle brackets. I can't really pull out all the different formats; I've got too many messages and only a handful are exceptions.

demize
  • 364
  • 1
  • 15
  • I guess your second example has the second comma in the wrong place? How does a "recipient without an email address" show up? What does "load the string as a CSV" mean? PowerShell `ConvertFrom-CSV` will return objects, not strings with commas in them..? – TessellatingHeckler Dec 12 '16 at 21:32
  • `'"Name One" , "Name Two" ' | ConvertFrom-Csv -Header A, B` – Ansgar Wiechers Dec 12 '16 at 21:34
  • @TessellatingHeckler In the second example, the name is in the "Last Name, First Name" format instead of "First Name Last Name". As for recipients without an email address, they show up with a name but no email: `"Name Three"` instead of `"Name Three" `. The data is not very well formatted. – demize Dec 12 '16 at 21:41
  • @AnsgarWiechers That would be perfect if I had a specific number of recipients on each email; alas, I've seen up to a couple hundred on each and I can't really specify that header. – demize Dec 12 '16 at 21:42
  • What is it exactly that you want to achieve? Parsing a varying number of e-mail addresses as a CSV doesn't really make much sense. What do you want to do with the data once you split it up? – Ansgar Wiechers Dec 12 '16 at 21:49
  • @AnsgarWiechers I have a list of all the metadata for each email in my dataset, and I'm extracting some of it and feeding it into a C# application that uses FileHelpers to parse the resulting CSV. FileHelpers can take the varying number of columns and stick it all into an array, I just need to get something that it can parse as a list rather than a single cell. – demize Dec 12 '16 at 21:53
  • And do all your records have the form `"something" `, i.e. text between double qoutes followed by an e-mail address between angular brackets? Or are there variations? – Ansgar Wiechers Dec 12 '16 at 21:56
  • @AnsgarWiechers There are variations, mostly the one where there's just a name and no address. I updated the question with a bit more information. – demize Dec 12 '16 at 22:10
  • Try ``-replace '(".*?"(?: <.*?>)?), *', "`$1`n" -split "`n"``. – Ansgar Wiechers Dec 12 '16 at 22:15

2 Answers2

3

You can use this nifty trick to match only on commas followed by an even number of quotation marks:

PS C:\> $pattern = ',(?=(?:[^"]*"[^"]*")*[^"]*$)\s?'
PS C:\> $mailAddresses = '"John Doe" <john@doe.example>, "Doe, Jane" <jane@doe.example>'
PS C:\> $mailAddresses -split $pattern
"John Doe" <john@doe.example>
"Doe, Jane" <jane@doe.example>
Community
  • 1
  • 1
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • Thanks! I think this will work perfectly. I'll have to wait until later to implement it, but it should be a faster solution than my current "walk through every character" one... – demize Dec 13 '16 at 00:36
-1

try this

$yourlist = '"John Doe" <john@doe.example>, "Doe, Jane" <jane@doe.example>'
$yourlist -replace ">, ", "`n" -replace "<|>|,", "" | ConvertFrom-Csv -Delimiter " " -Header Name, Email
Esperento57
  • 16,521
  • 3
  • 39
  • 45