3

I am working on a robotics research project, and would like to know: Does anyone have suggestions for best practices when organizing scientific data and code? Does anyone know of existing scientific libraries with source that I could examine?

Here are the elements of our 'suite':

  • Experiments - Two types:
    1. Gathering data from existing, 'natural' system.
    2. Data from running behaviors on robotic system.
  • Models
    • Description of dnamical system - dynamics, kinematics, etc
    • Parameters for said system, some of which are derived from type 1 experiments
  • Simulation - trying to simulate natural behaviors, simulating behaviors on robots
  • Implementation - code for controlling the robots. Granted this is a large undertaking and has a large infrastructure of its own.

Some design aspects of our 'suite':

  • Would be good if simulation environment allowed for 'rapid prototyping' (scripts / interactive prompt for simple hacks, quick data inspection, etc - definitely something hard to incorporate) - Currently satisfied through scripting language (Python, MATLAB)
  • Multiple programming languages
  • Distributed, collaborative setup - Will be using Git
  • Unit tests have not yet been incorporated, but will hopefully be later on
  • Cross Platform (unfortunately) - I am used to Linux, but my team members use Windows, and some of our tools are wed to that platform

I saw this post, and the books look interesting and I have ordered "Writing Scientific Software", but I feel like it will focus primarily on the implementation of the simulation code and less on the overall organization.

Community
  • 1
  • 1
eacousineau
  • 3,457
  • 3
  • 34
  • 37

1 Answers1

2

The situation you describe is very similar to what we have in our surface dynamics lab. Some of the work involves keeping measurements data which are analysed at real time, or saved for late analysis.Some other work, on the other hand, involves running simulations and analysing their results.

The data management scheme, which the lab leader picked up at Cambridge while studying there, is centred around a main server which holds the personal files of all lab members. Each member access the files from his work station by mounting the appropriate server folder using NFS. This has its merits and faults. It is easier to back up everything, but is problematic when processing large amounts of data over the net. For this reason i am an exception in the lab, since the simulation i work with generates a large amount of data. This data is saved on my work station, and only the code used to generate it (source code of the simulation and configuration files) are saved on the server.

I also keep my code in an online SVN service, since i can not log into to lab server from home. This is a mandatory practice, which stems from the need to be able to reproduce older results on demands and trace changes to the code if some obscure bug appears. Hence the need to maintain older versions and configuration files.

We also employ low tech methods, such as lab notebooks to record results, modifications, etc. This content can sometimes be more abstract (no point describing every changed line in the code - you have diff for this. Just the purpose of the change, perhaps some notes about implementations and its date).

Work is done mostly with Matlab. Again i am an exception, as i prefer Python. I also use C for the data generating simulation. Testing are mostly of convergences, since my project now is concerned with comparing to computational models. I just generate results with different configurations, saved in their own respected folder (which i track in my lab logbook). This has the benefits of being able to control and interface the data exactly as i want to, instead of conforming to someone else's ideas and formats.

Mickey Diamant
  • 657
  • 1
  • 9
  • 17