4

The National Park Service's Natural Sounds Program collects multiple terabytes of data each year measuring soundscapes. In your opinion, what is best available scripting language to manage massive amounts of files and file types? We would like to easily design and run efficient user-friendly scripts to search for and retrieve/create copies of files that may be located in different directories according a single static hierarchy. The OS will most likely be windows. Thanks!

Grace Note
  • 3,205
  • 4
  • 35
  • 55
Dan
  • 41
  • 1
  • 2

3 Answers3

6

Use the one your developers are most familiar with. The productivity gains you'll get from that will almost certainly beat out any advantages that one language may have over another.

Eric Petroelje
  • 59,820
  • 9
  • 127
  • 177
3

Use Python. It's easy to learn. Everyone can easily convert.

The size of the files doesn't much matter when you're searching directories or searching for metadata outside the files. Even so, you rarely need to read an entire sound sample file to strip off the metadata.

Also, if you're doing this frequently, you might want to consider

  1. Extract all the metadata to a relational database.

  2. Use the relational database as a complex "index" to the sound sample files.

Each file add or change would be done through an application that synchronized file changes with database updates to assure that the database index actually matches the filesystem.

The bulk of your searches might become SQL queries.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • 1
    I went for python too, for a similar problem, but for archiving Terabyte of DNA sequencing data. I wrote the first prototype in Perl actually, but when complexity of the script grew, I needed to turn to some OO (Object Oriented) and DDD (Domain Driven Design) patterns to make it robust, and OO in perl just isn't that nice, while in Python it is just excellent. Also, maybe foremost, the cleaner syntax of python, in combination with DDD principles such a s "document your functions in their names", made the script far much more readable than what I could accomplish in perl. – Samuel Lampa Sep 10 '11 at 10:15
2

I don't really know what your are going to be looking for in a scripting language, but Eric is right that you should use something all your developers are familiar with. However, if you don't have developers (yet) and are designing the project (and team) from the ground up, C++ or .Net (C# or VB).

While C++ offers more powerful programming and performance, C# and VB.Net offer quicker production. Regardless of .Net's production advantage, I would think that for massive amounts of files & file types, you will have the best overall satisfaction from C++. In my opinion, the best user friendly design requires very little user input other than clicking buttons or selecting options from a list.

IAbstract
  • 19,551
  • 15
  • 98
  • 146