I'm trying to work out the best way to extract chunks of base64
out of a file containing both plain text and base64
Say I have the string
Subject: Fwd: Test.
Thread-Topic: Test.
Date: Tue, 5 May 2020 19:02:42 +0000
U3ViamVjdCB0byBiYXNlNjQgZGVjb2Rl
--_000_DB6PR10MB1831AAD962E88A95B21547589EA70DB6PR10MB1831EURP_
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
IExvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0LCBjb25zZWN0ZXR1ciBhZGlwaXNjaW5nIGVsaXQu
IEludGVnZXIgc2VtIG51bGxhLCB0aW5jaWR1bnQgZXUgdmVuZW5hdGlzIHNlZCwgZWdlc3RhcyBz
ZWQgcmlzdXMuIEZ1c2NlIG5vbiBkb2xvciBmZWxpcy4gTnVuYyB2aXRhZSBuaXNsIG1vbGVzdGll
LCBtb2xsaXMgbWFzc2EgZXQsIGVsZWlmZW5kIHB1cnVzLiBQcm9pbiBhIGFsaXF1ZXQgZXJhdC4g
Q3JhcyB2ZWhpY3VsYSBtb2xlc3RpZSBlbGl0IGFjIHByZXRpdW0uIE5hbSBhIGxlbyBmcmluZ2ls
bGEsIGdyYXZpZGEgbGVvIHNpdCBhbWV0LCBvcm5hcmUgYXVndWUuIE51bGxhbSBmYWNpbGlzaXMs
IGxlbyBldCBydXRydW0gaGVuZHJlcml0LA==
--_000_DB6PR10MB1831AAD962E88A95B21547589EA70DB6PR10MB1831EURP_--
Mail Retrieved
I would expect the output to be the following strings
U3ViamVjdCB0byBiYXNlNjQgZGVjb2Rl
IExvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0LCBjb25zZWN0ZXR1ciBhZGlwaXNjaW5nIGVsaXQu
IEludGVnZXIgc2VtIG51bGxhLCB0aW5jaWR1bnQgZXUgdmVuZW5hdGlzIHNlZCwgZWdlc3RhcyBz
ZWQgcmlzdXMuIEZ1c2NlIG5vbiBkb2xvciBmZWxpcy4gTnVuYyB2aXRhZSBuaXNsIG1vbGVzdGll
LCBtb2xsaXMgbWFzc2EgZXQsIGVsZWlmZW5kIHB1cnVzLiBQcm9pbiBhIGFsaXF1ZXQgZXJhdC4g
Q3JhcyB2ZWhpY3VsYSBtb2xlc3RpZSBlbGl0IGFjIHByZXRpdW0uIE5hbSBhIGxlbyBmcmluZ2ls
bGEsIGdyYXZpZGEgbGVvIHNpdCBhbWV0LCBvcm5hcmUgYXVndWUuIE51bGxhbSBmYWNpbGlzaXMs
IGxlbyBldCBydXRydW0gaGVuZHJlcml0LA==
I've created a regex which creates the desired match
^\n([a-zA-Z0-9+\/=\n]*)\n$
But the following in c#
returns no matches
var test1 = Regex.Matches(input, @"^\r\n([a-zA-Z0-9+/=\n]*)\r\n$");
var test2 = Regex.Matches(input, @"^\n([a-zA-Z0-9+/=\n]*)\n$");
Whilst I can fix the regex, I'm now wondering if there's a more effecient way of achieving this. Additionally, some of the input strings will be rather large.