0

I'm trying to understand, why when I check my text document content (updated content, each time it is new string) with new insert for similar already exist string, for example if document content is:

hello world
hello, world
hello, world.
.hello, world

it founds new added string if it is already exists in content of file, if it is "hello world" or "hello, world", with simple checking condition, which notifies me if string already exist (and there is no any limitations or other conditions about last symbol in string):

 List<string> wordsTyped = new List<string>(); 

    if (wordsTyped.Contains(newStr))
    {
        string[] allLines = File.ReadAllLines(path);
    }

but it doesn't notifies me if I have in my document content string with punctuation mark at the end or in the beginning of the string. For example if "hello, world." which is already exist, and new insert is similar "hello, world." or ",hello, world" it does not find it and notifies me as non exist.

If there is no solution to figure out with this problem and I am forced to remove last special symbol in the string, in this case would be good also to know, how to do it with regex for certain symbols dot, comma, hash, and apostrophe and keep everything else of course

  • Please elaborate and edit, the question is unclear – Jim Oct 24 '16 at 00:36
  • @Jim in which point is unclear? I cant find string in text document if at the end of the string is a punctuation mark. I can find **h,e.l!l:o** but can't find it if it is **h,e.l!l:o.** or **h,e.l!l:o;** no matter or **,h,e.l!l:o** or if I got it in the beginning of the string –  Oct 24 '16 at 00:44
  • If you only care about special characters before and after your input string you can just use a for-loop to iterate over your array and use the contains-method of String to check if "hello world." contains "hello world". In case you have special characters between those words, [take a look at this thread](http://stackoverflow.com/questions/6555182/remove-all-special-characters-except-space-from-a-string-using-javascript) to get a rough idea of the regex. I'd also recommend using a site like regex101.com to create your regex fool-proof :) – Seth Oct 24 '16 at 07:33

1 Answers1

1

You might want to use a HashSet to store the string you already have since the access is way faster. Then remove all the characters you don't want in the string:

static String beautify(String ugly)
{
    return String.Join("", ugly.Where(c => Char.IsLetter(c)));
}

Here I took the liberty to check only if the character is a letter, you can, of course, adapt it to feel your needs. Then use this little program:

static HashSet<String> lines = new HashSet<String>();
static List<String> input = new List<String>()
{
    "hello world","hello, world","hello, world.",".hello, world",
};

static void Main(String[] args)
{
    initList(input);
    var tests = new List<String>() {
        "h,e.l!l:o. w----orl.d.",// True
        "h,e.l!l:o. w----ol.d.",// False

    };

    foreach(var test in tests)
    {
        Console.WriteLine($"The string \"{test}\" is {(lines.Contains(beautify(test)) ? "already" : "not" )} here"); 
    }

    Console.ReadLine();
}

static void initList(List<String> input)
{
    foreach(String s in input)
        lines.Add(beautify(s));
}

static String beautify(String ugly)
{
    return String.Join("", ugly.Where(c => Char.IsLetter(c)));
}

Which will output:

The string "h,e.l!l:o. w----orl.d." is already here

The string "h,e.l!l:o. w----ol.d." is not here


You can use an HashSet like so:

lines
Count = 4
    [0]: "hello world"
    [1]: "hello, world"
    [2]: "hello, world."
    [3]: ".hello, world"
lines.Contains("hello, world.")
true
lines.Contains("hello, world..")
false
Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
  • Hello, very useful, but not in my case. I can't predict each new insert or updated content in HashSet and find string with removing extra punctuation and comparing it with the HashSet fixed content. I do not want to remove any punctuation from string, I want find it exactly as it was saved and know about it if new insert is equal to exist string. If I already have string **ok.** with dot at the end and my new insert is equal **ok.** I want find it. It works with **ok** and **o.k** until the punctuation mark is not at the end or beginning of a string **.ok** and **ok.** –  Oct 24 '16 at 18:11
  • @mickbt then juste use the hashset without the beautify function – Thomas Ayoub Oct 24 '16 at 18:29
  • yes but how to be if I want do it with any new string user put in file and then requested for it, if I need prescribe strings in HashSet to use this solution? Maybe I didn't quite understand correctly this solution? –  Oct 24 '16 at 18:54
  • Hello, thank you for your support, so if I can go this way, then how can I dynamically add new value to HashSet each time I insert new string and then after restarting the program keep collected data for new adds, which also must be check on insert for true and false with asked punctuation mark? –  Oct 25 '16 at 15:59
  • Seems like it can be good solution, but I myself have not yet received the full results, that why I can not be sure if it is fully applicable in my case, let me find out and I note answer as answer to my question. Also would be good to see working example with new insert and saving for for further using on your code, but also would be good to understand the reason why first and last punctuation is not recognized in my case with given condition –  Oct 25 '16 at 17:42