0

I am calling a FORTRAN program in R and analyzing FORTRAN's output file, which is a little big (around 50M per iteration). For each iteration, it takes about 50 seconds, in which the read.table command needs 42 seconds. Since I need to repeat this program 100,000 times, I am wondering if there are better ways to speed it up?

For example, is it possible to let FORTRAN save everything into memory and pass it to R?

Thanks!

TTT
  • 4,354
  • 13
  • 73
  • 123
  • 1
    Have you optimized your call to read.table? See http://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r – mnel Oct 24 '12 at 21:53
  • @mnel: I will check that first. Thanks! – TTT Oct 24 '12 at 22:10

1 Answers1

4

Absolutely -- write the file as binary files in Fortran, and then just read them via readBin() in R which will be very fast. But make sure you check for endianness, four versus eight byte floating point and that.

If you want a tested library, look into the various serialization libraries as eg RProtoBuf etc. Not sure how many have Fortran bindings though...

Edit: No luck with Protocol Buffers and Fortran per the add-ons page. Maybe a science-ish format like hdf5 will be better for you.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Thanks for the suggestion. I will look into those libraries. – TTT Oct 24 '12 at 22:11
  • `readBin()` is basic R. We use that to move (big) matrices around but writing a short header with metainformation followed by the data. From C/C++ as well as R, we even get transparent gzip support... – Dirk Eddelbuettel Oct 24 '12 at 22:15
  • Not very familiar with this binary approach. So it should work for output file which is a combination of characters and numbers, right? – TTT Oct 24 '12 at 22:19
  • We use just numbers, all encoded the same way as 8-byte doubles. You could do the equivalent of data.frames by writing each column, following a header denoting formats ... but at some point the serialization libraries will be easier. – Dirk Eddelbuettel Oct 24 '12 at 22:22