0

I have 2 methods.

Method 1 calls a google drive service which returns an array of an object of files.

Method 2 calls the first method to get the returned list, which calls the google drive service which activates the method you see below for a "second" time.

I may be wrong but I think the recursion is causing my final list results to be doubled. Or it may be because I have declared files outside of this method. Once the line files.AddRange(....) is called by the second method it technically already contains the files from the first method. I am not quite sure how to solve my issue.

readonly Stack myStack = new Stack();
readonly HashSet<BFile> files = new Hashset<BFile>();
readonly HashSet<BFile> pushedList = new HashSet<BFile>();

public async Task<(BFile[]? files, string? error)> GetFiles(string parentId, bool includePermissions)
{

    var service = service..

    if (service != null)
    {
        var listRequest = service.Files.List();

        do
        {
            var response = await listRequest.ExecuteAsync();

            var folders = response.Files.Where(f => f.MimeType == "application/vnd.google-apps.folder");
            var allOtherFiles = response.Files.Where(f => f.MimeType != "application/vnd.google-apps.folder"); 

            files.AddRange(folders.Where(f => f.Name != "$ExclaimerSignatures").Select(f => mapFile(f)));
            files.AddRange(allOtherFiles.Select(f => mapFile(f)));

            var missingFiles = files.Where(f => !pushedList.Contains(f)).ToList();
            missingFiles.ForEach(myStack.Push);
            pushedList.UnionWith(missingFiles);

            while (myStack.Count != 0)
            {
                var temp = (BFile)myStack.Peek();
                myStack.Pop();
                await GetFiles(temp.Id, true);
            }

            listRequest.PageToken = response.NextPageToken;

        } while (listRequest.PageToken != null);

        return (files.ToArray(), null);
    }
    else
        return (null, "Something went wrong");
}

edit: to answer a question from below the only reason why I have stack with recursion tree walking is I use the stack to keep track of stuff that has been visited. I am not sure if that is bad or not, it was just simply what I came up with upon initially writing this code

mharre
  • 233
  • 3
  • 17
  • 1
    Have you stepped through your code with a debugger to see what it happening? Your code is a little hard to file. For example, what type is `listRequest`? For that matter, what are `folders`, `allOtherFiles` and, `files` (I don't see this declared at all). Similarly, `pushedList` appears out of nowhere. – Flydog57 Aug 18 '22 at 23:42
  • 1
    Could you please clarify why you have stack and recursion in the tree walk at the same time? Ideally [edit] the question with your reasoning about the algorithm (check regular BFS/DFS implementations to highlight differences from your approach). – Alexei Levenkov Aug 19 '22 at 00:14
  • What is the type of the `response.Files` property? Is it a deferred `IEnumerable`? – Theodor Zoulias Aug 19 '22 at 02:00
  • Yes I have stepped through the code for a few hours trying to figure out what is going on but I can't really see ha. For some reason when the second call happens files is already populated. I am thinking it is possibly because files is declared outside of the method and therefore has it's place in memory already. @AlexeiLevenkov honestly it was just the first way I thought of designing this. The stack is to keep track of what has been visited and what hasn't basically. The response type is the google api v3 response, I forget the exact name of it – mharre Aug 19 '22 at 18:10
  • @TheodorZoulias what you don't see is that it get's turned into a IEnumerable via mapFile. Hope that answers your question – mharre Aug 19 '22 at 18:11
  • What I am interested to know about the `response.Files` property is whether it contains a [deferred enumerable](https://stackoverflow.com/questions/1168944/how-to-tell-if-an-ienumerablet-is-subject-to-deferred-execution) or a materialized collection. – Theodor Zoulias Aug 19 '22 at 18:18
  • Hmm, based on google it is hard for me to tell how to check that. Should I check the type after the LINQ expression or after? – mharre Aug 19 '22 at 19:09
  • I assume that it will be easy to deduct if you have access to the source code of the class, of which `response` is an instance. If you don't have access, then never mind. It's probably not a critical information for answering the question. As a side note, using `var` instead of the concrete type is convenient in the environment of the Visual Studio, but not in the environment of a StackOverflow question. We can't hover the mouse and learn what the type is! – Theodor Zoulias Aug 19 '22 at 19:37

1 Answers1

1

I think the issue is that HashSet is not determinating BFiles like unique, and it is a reason why HashSet<BFile> files have duplicated values.

To fix it, you can override Equals and GetHashCode in your class BFile. If BFile is not your class, check if maybe it has already it implemented. If not, you can do it with the IEqualityComparer interface. Here is a simple example:

var files = new HashSet<BFile>(new Comparer());
var items = Enumerable.Empty<BFile>(); //Can be any collection

files.UnionWith(items);

class Comparer : EqualityComparer<BFile>
{
    public override bool Equals(BFile? x, BFile? y) => x.Id == y.Id;

    public override int GetHashCode(BFile obj) => obj.Id.GetHashCode();
}

class BFile
{
    public string Id { get; set; }
    public string Name { get; set; }
    public byte[] Content { get; set; }
}

I hope it will help you.

  • I don't understand how this answer addresses the question. Could you please explain? (Clearly OP has some sort of extension so their code compiles - otherwise OP would ask for syntax error or something like that instead of claiming that there are "double results") – Alexei Levenkov Aug 19 '22 at 00:09
  • Sure, my suggestion is that BFile have a default GetHashCode implementation and it is a reason why it is duplicating in the result. So to not insert duplicate objects in the HashSet collection is to override Equals and GetHashCode for BFile, and in this case, duplicated values will be ignored and results should not be duplicated. Using UnionWith to insert a range of items is probably not related to the issue with duplicated values. Will edit my answer about it – Vladyslav Ishchuk Aug 19 '22 at 00:30
  • Yeah I thought about that as well and knew that is why hashset isn't working technically because of the comparison of an object. But specifically for BFile it is a record type, and to be honest with you, I wasn't sure how I could change the implementation of Equals and GetHashCode (this was going to be my last result option) – mharre Aug 19 '22 at 18:09
  • So in my opinion, for your case better create the `EqualityComparer` class and implement these 2 methods in this way. Also, it should work fine with records too. The only one that you need to do is find a unique parameter for BFile model. I think there is should be something like Id, but probably it can be different. If you attach this model to the question, I can help you with it. – Vladyslav Ishchuk Aug 19 '22 at 21:12