1

Am I correct in my explanation when calculating the time complexity of the following algorithm?

  • A HashSet, moduleMarksheetFiles, is being used to add the files that contain the moduleName specified.

    for (File file: marksheetFiles){
         while(csvReader.readRecord()){
             String moduleName = csvReader.get(ModuleName);
    
             if (moduleName.equals(module)){
                   moduleMarksheetFiles.add(file);
             }
         }
     }
    
  • Let m be the number of files

  • Let k be the average number of records per file.
  • As each file is added only once because HashSet does not allow for duplicates. HashSet.add() is O(1) on average and O(n) for worst case.
  • Searching for a record with the specified moduleName involves comparing every record in the file to the moduleName, will take O(n) steps.

Therefore, the average time complexity would be: O((m*k)^2).

Is this correct?

Also, how would you calculate the worst case?

Thanks.

PS. It is not homework, just analysing my system's algorithm to evaluate performance.

user1339335
  • 451
  • 1
  • 4
  • 3
  • In your 4th and 5th bullets, is n == m? Also, once you add `file` to `moduleMarksheetFiles`, why not break out of the inner loop? Finally, what kind of data structure is `marksheetFiles`? – Ted Hopp Apr 29 '12 at 01:53
  • 1
    Isn't it just O(mk)? Why do you think it's squared? – Kevin Apr 29 '12 at 01:53
  • @Ted Hopp -marksheetFiles is also a HashSet data structure. Yes n==m in the 5th bullet point sorry. As the system is already developed, it is not possible to add the break currently until the next cycle. Thanks for the advice though – user1339335 Apr 29 '12 at 02:05
  • @Kevin - I thought its squared because you would have to compare every record to the moduleName. Is the incorrect. Could you explain as to why you think its O(mk)? – user1339335 Apr 29 '12 at 02:09
  • unless moduleName.equals(module) takes O(mk) time, the overall time complexity is O(mk). – deebee Apr 29 '12 at 03:35
  • for the worst case each add to hashset would take O(mk) time. So the complexity would be O((mk)^2) – deebee Apr 29 '12 at 03:37
  • @debee - Could you please further elaborate on this? – user1339335 Apr 29 '12 at 03:54
  • Considering moduleName.equals(module) as O(1) and avg hashset.add as O(1), the loop takes on O(mk) time. If hashset.add takes O(mk) time for the worst case, then it's easy to see that the overall time complexity is O((mk)^2). But, looking at the code, hashset contains only file names and there are only m files, so to correct my earlier statement it will be O(m^2 * k) – deebee Apr 29 '12 at 04:12
  • possible duplicate of [Algorithm Complexity Time](http://stackoverflow.com/questions/10368489/algorithm-complexity-time) – amit Apr 29 '12 at 16:18

1 Answers1

2

No, it's not squared, this is O(nk). (Technically, that means it's also O((nk)²), but we don't care.)

Your misconception is that it the worst-case performance of HashSet is what counts here. However, even though a hashtable may have worst-case O(n) insertion time (if it needs to rehash every element), its amortized insertion time is O(1) (assuming your hash function is well behaved; File.GetHashCode presumably is). In other words, if you insert multiple things, so many of them will be O(1) that the occasional O(n) insertion does not matter.

Therefore, we can treat insertions as constant-time operations, so performance is purely dictated by the number of iterations through the inner loop body, which is O(nk).

Thomas
  • 174,939
  • 50
  • 355
  • 478
  • Does O(nk) mean it is linear? – user1339335 Apr 29 '12 at 16:00
  • Linear in the product of the number of files and the average number of lines per file, yes. Or, in other words, linear in the total number of lines across all files. – Thomas Apr 29 '12 at 16:20
  • Your last statement is misleading. HashSet insert is O(nk) worst case, even with infinite size table, which doesn't need rehashing - due to collisions. assume `hash(element)=1` [for each element] - every search is `O(n)`, thus *worst case* complexity of the entire algorithm is indeed `O(n^2k^2)`. Average case is of course not. – amit Apr 29 '12 at 16:21
  • Right, yes, I'm assuming a well-behaved hash function. `File`, being from the standard library, presumably has one. – Thomas Apr 29 '12 at 16:24
  • As I said it is misleading - not wrong. The intent is clear for experienced readers - but might confuse novice programmers. – amit Apr 29 '12 at 16:25