-2

I'm working on a project that needs to look at large amounts of data (~1TB) and copy it from drive A to drive B. It will be constantly run in the background (or tray) and run a check every XX hours/mins. At that time, it will check if there are any NEW files in drive A and copy them to drive B. If there are any files that were updated and newer then it will also copy and replace the files from A to B.

I'm not really sure where to start. Should I write this in Python or C# (maybe visual?)? If someone could give me some advice I would greatly appreciate it. Thanks!

EDIT:

Just wanted to give an update! I ended up using Robocopy, which is built into Windows. I moved away from Python and just created a small batch file that would check all of the files in drive A and compare to drive B. If anything was new or didn't exist, it copies it over. I then set up a task through Task Scheduler, also built into Windows. Works PERFECTLY in literally just 1 line of code in a batch file!

Dylan Beck
  • 177
  • 1
  • 1
  • 6
  • 5
    Welcome to SO, please post the code you have written, and describe what's not working about it. – Will Dec 13 '16 at 14:38
  • These days its less about which language to use because its right (eg fortran, cobol etc) and more because you know how... – BugFinder Dec 13 '16 at 14:46
  • I agree with BugFinder, it would definitely be doable in Python. I was just using c# since its fairly easy to make windows services. – Tyler C Dec 13 '16 at 14:50
  • I wrote one of these in C# as a library accessible from either Winform or command line app or a service. How you run it is not very important. You just have to track the modification date of the files and your file routines need to be recursive to allow any depth of folders. – iCollect.it Ltd Dec 13 '16 at 14:50
  • Very true, he doesn't need to use checksums if he's okay with just looking at the modify date. Then he could just do write validation. – Tyler C Dec 13 '16 at 14:54
  • wow all these responses came faster than I was expecting. Thanks for the info. Currently I don't have any code written. Just trying to find out how the core of the program should work and I can figure out the rest. I'm leaning towards C# so I can create a simple windowed program. Currently only need the modification date of files. – Dylan Beck Dec 13 '16 at 15:07
  • The task seems likely to be I/O bound, so the language doesn't matter too much in that regard, nor does the fact you want to create a simple windowed program. However, it would probably be easier to create a C# tray application than doing so using Python. – martineau Dec 13 '16 at 15:12

1 Answers1

1

I was starting to look into building something like this myself. I was going to write it in c#, probably as a system service and then have it periodically scan for new files. It would then build checksums with either sha1 or md5. You can look here about how to generate an MD5 in c#. Here is some additional information talking about byte-for-byte vs checksum comparisons.

After it has its hash list, it can do a transfer of the files then do another hash on the destination to ensure it was written properly. I was going to just hang on to all the hashes and then when it rescans the directory it has something to compare to in order to see if a file was updated. Then it would just repeat the above.

Community
  • 1
  • 1
Tyler C
  • 573
  • 5
  • 17
  • Thanks. I'm actually looking to just check the modification date and if the file is newer, download and replace. Just not sure where I should begin. Is there a library I could use? lol I'm just confused as to where to begin with something like this. After I get how the core should work, I can figure out the rest. – Dylan Beck Dec 13 '16 at 15:09
  • Here is the documentation for System.IO.File for windows. You can use it to get things like last write time (modified date). https://msdn.microsoft.com/en-us/library/system.io.file(v=vs.110).aspx – Tyler C Dec 13 '16 at 16:03
  • For the core, you can either do a console application with a tray icon or a system service https://msdn.microsoft.com/en-us/library/zt39148a(v=vs.110).aspx Then you just need the scan to occur regularly. A simple implementation could be like this http://stackoverflow.com/questions/12535722/what-is-the-best-way-to-implement-a-timer – Tyler C Dec 13 '16 at 16:11