0

I want to write a small piece of code that checks a certain column in excel for the file name, finds it and then renames it.

This is how I have done it:

int colNo = oWorksheet.UsedRange.Columns.Count;
int rowNo = oWorksheet.UsedRange.Rows.Count;

// read the value into an array.
object[,] array = oWorksheet.UsedRange.Value;
for (int j = 1; j <= colNo; j++)
{
    for (int i = 1; i <= rowNo; i++)
    {
        if (array[i, j] != null)
        {
            if (array[i, j].ToString() == "vin")
            {
                for (int m = i + 1; m <= rowNo; m++)
                {
                    try
                    {
                        string name = array[m, j].ToString(); //accessing the name to be renamed
                        string invoice_name = array[m, j + 7].ToString(); // invoice_name
                        invoice_name.Trim();

                         string directoryPath = @"C:\Users\User\Documents\Invoices\";
                         invoice_name = invoice_name.Replace(" ", "");
                         string[] files = new string[0];
                         files = System.IO.Directory.GetFiles(directoryPath, "*" + invoice_name + ".pdf");
                         // then do something with the found files
                        }

What my code is doing is its searching for the files located in directoryPath and then finding them with the same value that is stored in 'invoice_name'. Since I am using the * wildcard, it is searching for all files which have the same name as the value in 'invoice_name'.

However if there is a file found which has the value matching 80% of the value stored in 'invoice_name' then the Directory.GetFiles method returns a null. For instance if the value stored in 'invoice_name' is 'Michael' then the Directory.GetFiles returns all files which have the name 'Michael' or Michael+something.
However it does NOT return files which have names 'Michae', 'Micha', 'Mich' or anything less than 'Michael'.

Can someone please suggest something? I've been looking into existing solutions/questions and I found this: How to find folders and files by its partial name c# but it doesn't help me, I tried it out.

rene
  • 41,474
  • 78
  • 114
  • 152
Richeek Dey
  • 257
  • 2
  • 15
  • 3
    I recommend restructuring your code so that you don't have a `for` inside an `if` inside an `if` inside a `for` inside a `for`. Think carefully before you nest 2 levels deep. Very carefully for 3. Never 4. – Mateen Ulhaq Feb 05 '18 at 19:36
  • Thanks! This is actually just a snippet. Rest assured, the integrity of the code is taken care of. :) – Richeek Dey Feb 05 '18 at 19:38
  • If `invoice_name` is `"Michael`", you're going to have to `substring` it in order to get `"Mich"` otherwise how is it to know to search for sub portions of it? That wildcard only looks for things in front of the literal string. – ragerory Feb 05 '18 at 19:38
  • 1
    @RicheekDey are you saying the code in the question doesn't accurately reflect your actual code? –  Feb 05 '18 at 19:38
  • @ragerory I did try that too. But doing invoice_name + "*.pdf" brings up all files which contain invoice_name AND more. I want it to find files which have a partial invoice_name as well. – Richeek Dey Feb 05 '18 at 19:40
  • what you are trying to do is called "edit distance". You would need to use dynamic programming to solve it. Tons of examples online. – Steve Feb 05 '18 at 19:40
  • @Amy no, I am merely saying that this is a part of a code. I didn't put the closing loops perhaps, since they're part of a bigger function – Richeek Dey Feb 05 '18 at 19:41
  • @Steve I didn't get you – Richeek Dey Feb 05 '18 at 19:41
  • @RicheekDey https://en.wikipedia.org/wiki/Edit_distance is what you are looking for – Steve Feb 05 '18 at 19:41
  • @Steve while the logic is similar, I'm not trying to do that. I'm not converting the strings. I'm merely looking for an extension to the Directory.GetFiles method I suppose. – Richeek Dey Feb 05 '18 at 19:43
  • @RicheekDey no. that is exactly what you are trying to do . You want to retrieve the file if its 3-4 degree similar to the original word. This is typical edit distance problem. And no the native GetFiles doesn't support it you would have to use dynamic programming to code it yourself – Steve Feb 05 '18 at 19:46
  • @RicheekDey You're either going to have to do the string distance fuzzy matching Steve suggested or choose an unique substring of invoice_name (which might not exist) and do the invoice_name+* that ragerory suggested. – Yuli Bonner Feb 05 '18 at 19:47
  • Can I use a substring of invoice_name? – Richeek Dey Feb 05 '18 at 19:48
  • @Steve Alright. Can you point me to an example please? I'm not quite sure how to approach this since edit distance is something I just heard about. – Richeek Dey Feb 05 '18 at 19:49
  • @RicheekDey just google it will give you tons of examples. like https://www.youtube.com/watch?v=We3YDTzNXEk – Steve Feb 05 '18 at 19:57
  • @RicheekDey Keep in mind, string distance searches are fuzzy. There will be some margin of error, if you go this route. Here's one that's not too hard to implement. It also allows for prefix weighting. https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance – Yuli Bonner Feb 05 '18 at 19:59
  • What happens if you have another invoice concerning someone called *Michelle*? Now what do you do with a file containing *Mich*? Your approach seems flaky to say the least. – InBetween Feb 05 '18 at 20:02
  • There is no solution to this problem that does not involve margin of error or prior knowledge of the complete list of possible invoice names. – Yuli Bonner Feb 05 '18 at 20:17

2 Answers2

0

This would give you 80% of the string and find the values that's name match, however, 80% of "Michael" will not give you "Micha". This is just an example on how to achieve your desired result based on your question.

Just search for a wildcard on 80% of the name:

    var str = "Michael";

    // get 80% of the string length
    var i = ((double) str.Length / 100) * 80;
    var x = Convert.ToInt32(i);

    var partialName = str.Substring(0, x);
    var files = Directory.GetFiles(directory, partialName + "*");
ragerory
  • 1,360
  • 1
  • 8
  • 27
0

Not sure if this is exactly what you're looking for, but you might consider constructing a loop in which you remove the last character from the prefix you're searching for ("Michael" in your example) on each search until you find a match:

public static List<string> FindClosestPrefixMatches(string directory, string filePrefix, 
    string fileSuffix)
{
    var result = new List<string>();

    if (!Directory.Exists(directory)) return null;

    // Loop removes one character from filePrefix on each search iteration
    for (int i = filePrefix.Length; i > 0; i--)
    {
        var wildcardName = filePrefix.Substring(0, i) + "*" + fileSuffix;
        var files = Directory.GetFiles(directory, wildcardName);

        if (files.Any())
        {
            result.AddRange(files);
            break;
        }
    }

    return result;
}
Rufus L
  • 36,127
  • 5
  • 30
  • 43