0

I'm working on a project that uses a plain ASCII .txt file as a key/value configuration file. The current format for ConfigFile.txt is something like

FirstName=Elmer|LastName=Fudd|UserId=EFudd|Password=fubar|Date=7/29/2016

This is easy to read into the program and create a dictionary with KeyValuePairs with code something like:

   using (FileStream fs = new FileStream("ConfigFile.txt", FileMode.Open))
    {
      using (StreamReader sr = new StreamReader(fs))
      {
        string fileText = sr.ReadToEnd();

        //  Tokenize the entire file string into separate key=value strings.
        string[] tokens = fileText.Split('|');

        //  Iterate through all of the key=value strings, tokenize each one into a key=Value 
        //  pair and add the key and value as separate strings into the dictionary.
        foreach (string token in tokens)
        {
          string[] keyValuePair = token.Split('=');
          configDict.Add(keyValuePair[0], keyValuePair[1]);
        }
      }
    }

It first splits out each key/value as a separate string using the '|' as the delimiter.

FirstName=Elmer

LastName=Fudd

UserId=EFudd

Password=foobar

Date=7/29/2016

Then, for each key/value string, it separates the key and value on the '=' delimiter, creates a KeyValuePair, and inserts it into a dictionary for later lookups in the program.

So far so good. Users are instructed not to create passwords with either delimiter. However, I now have to encrypt the password before including it in the file and the encryption routine can produce any printable character from 0x20 through 0x7F. So, an encrypted password can end up with either or both of the delimiters. I can end up with 'foobar' (or whatever) being encrypted by the encryption engine into P#|=g%. This messes up the ability of the split function to work properly.

So, I want to change the delimiters typed into the Notepad .txt file to control characters so that, instead of the '|' delimiter, I am using 0x1E (Record Separator) and replace the '=' sign with 0x1F (Unit Separator).

I can escape and code this directly in C# with no problems, but how would I modify the original .txt disk file so that it will read in the delimiters as single (non-printable) characters correctly?

  • 5
    why are passwords clear text? use a hash and then use base64 or hex to encode the binary. – Matthew Whited Jul 29 '16 at 19:56
  • As for how to read the file you do it just like you are currently reading the file. If you want to go crazy you could access the raw file stream but that takes more effort than care to explain on SO. – Matthew Whited Jul 29 '16 at 20:00
  • I inherited cleartext pwds. I wrote a utility to encrypt them in place before the encrypted text is hand-typed into the config file so there is no cleartext on disk or memory. The application uses the same encryption to unencrypt them just before using them. – MiddleAgedMutantNinjaProgrammer Jul 29 '16 at 20:05
  • Use JSON to store stuff like this, or even XML. – mxmissile Jul 29 '16 at 20:06
  • I know how to read the file and won't have to change the code in the program. What I don't know is what I substitute for '|' and '"' in the text file. How do I format the non-printable character in the text? '\1E'? '0x1E'? '\01E' and so on. – MiddleAgedMutantNinjaProgrammer Jul 29 '16 at 20:09
  • I don't have the freedom to change the format. Besides, an encrypted pwd can possibly have reserved characters from JSON or XML formatting, still breaking up the parsing. – MiddleAgedMutantNinjaProgrammer Jul 29 '16 at 20:10
  • @MatthewWhited That's what he's trying to change – Sam I am says Reinstate Monica Jul 29 '16 at 20:15
  • It's a simple question and I think everybody's overthinking it. Instead of coding FirstName=Elmer|LastName=Fudd|..., I need to know how to retype this string to represent the delimiters as 0x1E and 0x1F like FirstName\x1fElmer\x1eLastName\x1fFudd\x1e... only I need to know how to correctly specify the special characters so they will come through the FileStream and StreamReader from the .txt file intact. It is not my own app and I can't change the use of a text file, which has been in use for years. – MiddleAgedMutantNinjaProgrammer Jul 29 '16 at 20:28
  • 1
    I understand the question, but I think you're underestimating what goes into encoding and decoding control characters. There might be a utility out there that does it for you, but the closest I've found was [Regex.Unescape](https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.unescape(v=vs.110).aspx), which does it for the regex syntax, but not for c# syntax. which is probably what you want. – Sam I am says Reinstate Monica Jul 29 '16 at 20:38
  • Also, [This question](http://stackoverflow.com/questions/9738282/replace-unicode-escape-sequences-in-a-string) looks like it might be interesting to you. – Sam I am says Reinstate Monica Jul 29 '16 at 20:41
  • The comments here make sense. A different encoding or different format is needed. You can't just "specify the special characters in a different way"; that's just what you see while writing code. There's only a single representation for each character in the actual data. – Zong Jul 29 '16 at 20:52
  • _"I need to know how to correctly specify the special characters"_ -- specify _where_? Are you trying to type these characters in by hand? Or is this a question of how to include those characters in a string literal you have in your C# code? Please provide a good [mcve] that shows clearly what you've tried, with a precise explanation of what that code does now and want you want it to do instead. – Peter Duniho Jul 29 '16 at 21:08
  • If he jacks around with the byte arrays as Unicode he will break the data. – Matthew Whited Jul 29 '16 at 21:19

4 Answers4

0

So, Instead of having plain text like that, What I would do is use a proper serialization format, such as JSON.

There are tools out there that do the hard work for you.
The built-in System.Web.Script.Serialization namespace has some tools that you can use, but I prefer to use Json.Net. If you have Visual Studio, you can install it with nuGet(let me know in the comments if you need more help than that).

But once you add it to your project, you can do something like this

using System.Collections.Generic;
using System.IO;
using Newtonsoft.Json;

namespace ConsoleApplication1
{
    public class Program
    {
        static void Main(string[] args)
        {
            var dict = new Dictionary<string, string>();

            dict.Add("FirstName", "Elmer");
            dict.Add("LastName", "Fudd");
            dict.Add("Password", @"\a\ansld\sb\b8d95nj");

            var json = JsonConvert.SerializeObject(dict);

            File.WriteAllText("ConfigFile.txt, json);

            var txt = File.ReadAllText("ConfigFile.txt");
            var newDict = JsonConvert.DeserializeObject<Dictionary<string, string>>(txt);

        }
    }
}

and ConfigFile.txt will look like this

{"FirstName":"Elmer","LastName":"Fudd","Password":"\\a\\ansld\\sb\\b8d95nj"}

If you want it more human-readable, use

var json = JsonConvert.SerializeObject(dict, Formatting.Indented);

and you'll get

{
  "FirstName": "Elmer",
  "LastName": "Fudd",
  "Password": "\\a\\ansld\\sb\\b8d95nj"
}
  • It's what you code after "password": that is what is forcing me to do it this way. I have no control over what characters will be in the encrypted password, nor can I edit it. Escape characters can't be used. In fact, the encrypt routine will translate some characters into backslashes and must be able to translate them back again. The password must be exactly what came out of the encryption machine or it won't decrypt properly. U R outputting the encrypted pwd with escapes - it won't decrypt. I have NO control over what chars will appear in the encrypted pwd. – MiddleAgedMutantNinjaProgrammer Jul 29 '16 at 20:50
  • @MiddleAgedMutantNinjaProgrammer The characters will be escaped when serialized, but will be normal upon deserialization. If you can't deserialize in the app, then it's completely unsuited for storing passwords. – Zong Jul 29 '16 at 20:54
  • @MiddleAgedMutantNinjaProgrammer You don't need any control over what characters will be in the encrypted password. Just rely on the serialization framework to figure it out for you. – Sam I am says Reinstate Monica Jul 29 '16 at 20:56
  • 1
    Thanks to all for the effort, but everybody's trying to crack a walnut with a jackhammer. The users who are responsible for editing the config file are transplant surgeons and their teams. They are comfortable with the current simple format and will not accept, nor could they understand, any complications. It's only the fact that the password will now be encrypted using any and every char between 0x20 and 0x7e as a legitimate encrypted replacement char that kills any solution I've seen here. I'll just experiment and work it out myself. Caio. – MiddleAgedMutantNinjaProgrammer Jul 29 '16 at 20:57
  • @MiddleAgedMutantNinjaProgrammer and using `0x20` and `0x73` to represent characters doesn't seem like an extra complication to you? – Sam I am says Reinstate Monica Jul 29 '16 at 21:00
  • @MiddleAgedMutantNinjaProgrammer If you just want to change what the delimiters are, then just don't let users touch the file and only let the program mess with it – Sam I am says Reinstate Monica Jul 29 '16 at 21:10
  • @MiddleAgedMutantNinjaProgrammer That would have been helpful information to include in the question, but it doesn't change the situation. You can just encode passwords in a different way (e.g. hexadecimal) as some have suggested, and there'd be no problems. Also, using a standardized and very readable format like JSON over an adhoc solution is not overkill, it should be considered good practice. – Zong Jul 29 '16 at 21:10
  • JSON is not better than a delimited text file. He has no major need to change the delimiters but he doesn't need JSON or XML either. – Matthew Whited Jul 29 '16 at 21:14
  • Sam I Am, I can definitely cut off the top few printable characters (though 0x73 is 's' and I'd need to support up to 0x7A to include up to lowercase 'z' since I wrote the original algorithm. However, this encryption class is in wide use in scores of programs and, while there are methods in the class to increase its resistance to brute force, we don't want to have multiple branches of it. (For simple pwds using default settings, you must try 11.2 trillion to the 11.2 trillionth power to see all possible decryptions). Changing a dozen delimiters in an external text file is MUCH easier. – MiddleAgedMutantNinjaProgrammer Aug 01 '16 at 14:51
  • Sam I Am. Users are the only ones who touch the file. They are transplant surgeons and their staff spread all around the country and hundreds, if not thousands of people calling on an already overstressed IT department every time one needs to update or change a parameter would be brutal and time-consuming (and a possible danger to the patients who are often in the middle of a transplant operation when the surgeon needs to have a value changed. Time is critical and lives are at stake. Users don't change delimiters, only text, so using 2 delimiters below 0x20 makes the most sense. – MiddleAgedMutantNinjaProgrammer Aug 01 '16 at 14:56
0

You can convert integers to chars so just do this...

string[] tokens = fileText.Split((char)0x1e);
// ...
string[] keyValuePair = token.Split((char)0x1f);

... but encoding your passwords as base64 would be easier and cleaner...

string base64 = Convert.ToBase64String(passwordHash);
byte[] passwordHash = Convert.FromBase64String(base64);

... NOTE: it is possible that the hashes/encrypted data will contain these characters so I wouldn't just dump the hases into the text file.

Matthew Whited
  • 22,160
  • 4
  • 52
  • 69
0

The following class extract the string segments using Regular Expressions and support password with non-printable characters : 0x00 .. 0xFF The class include properties to the segments of the configuration

you can run Demo Example at .NEt Fiddle

using System;
using System.Text.RegularExpressions;


class ConfigParser
{
    public string Text { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public string UserId { get; set; }
    public string Password { get; set; }
    public string Date { get; set; }

    public ConfigParser(string text)
    {
        Text =text;
        Parse(text);
    }


    private static string pattern = @"
     ^FirstName=(?<firstname>\w+)    \|          
     LastName=(?<lastname>\w+)       \|              
     UserId=(?<userid>\w+)           \|                  
     Password=(?<pasword>.+)        
     Date=(?<date>.+)                         
     $
    ";

    private Regex regex = new Regex(pattern,
           RegexOptions.Singleline
           | RegexOptions.ExplicitCapture
           | RegexOptions.CultureInvariant
           | RegexOptions.IgnorePatternWhitespace
           | RegexOptions.Compiled
           );



    private void Parse(string text)
    {
        Console.WriteLine("text: {0}",text);
        Match m = regex.Match(text);
        FirstName = m.Groups["firstname"].ToString();
        LastName = m.Groups["lastname"].ToString();
        UserId = m.Groups["userid"].ToString();
        Password = m.Groups["pasword"].ToString();
        Date = m.Groups["date"].ToString();

    }

}

How to use:

   var text ="your text here"; 
   var c = new ConfigParser(text );             

   you can access the properties of the class: FirstName, LastName,....

   Console.WriteLine("firstname: {0}", c.FirstName);
   Console.WriteLine("lastname: {0}", c.LastName);
   Console.WriteLine("UserId: {0}", c.UserId);
   Console.WriteLine("Password: {0}", c.Password);
   Console.WriteLine("date {0}", c.Date);

Sample output: The password include non-printable characters | separator and symbols

text: FirstName=Elmer|LastName=Fudd|UserId=EFudd|Password=fg%|uy|◄¶|hj↑khg|Date=7/29/2016
firstname: Elmer
lastname: Fudd
UserId: EFudd
Password: fg%|uy|◄¶|hj↑khg
date: 7/29/2016
M.Hassan
  • 10,282
  • 5
  • 65
  • 84
0

Easiest Answer:

Insert the special characters into the string using the ALT-numberpad value trick. Record Group ALT-31 (▼) to delimit the end of a Key/Value pair and Item Group ALT-30 (▲) to delimit the key from the value. Save the string as UTF-8.

Code for delimiters is

private static char tokenDelimiter = ('▲');
private static char keyValuePairDelimiter = ('▼');

using the same ALT-numberpad trick to put in the up and down triangles. Include instructions that the black triangles are NEVER to be edited or removed and explain their meaning.

It takes me back to my old DOS days. Simple, and took 5 minutes to implement - and it doesn't require that the existing code base be materially changed - just the two delimiter characters changed.

  • Thanks to all who came up with a wide variety of techniques to do this. If it were a greenfield program, there are many things I would do differently. But, I work in an IT dept that still supports everything from '90s ASP (pre-ASP.NET) with an Access Database, and even IBM 360/370 mainframe BAL (Basic Assembly Language). When I am free to do my own projects, I routinely use techniques just being developed in academia. I have been using REST/JSON since before SOAP/XML came out. This app is a kludge, but I am not allowed to change it, I don't own it. – MiddleAgedMutantNinjaProgrammer Aug 01 '16 at 16:32