0

I have created some code for testing in Python and C++, where I read two matrices from a file and print something. It seems as if Python needs about twice as much time for I/O:

$ ./test.sh -i Testing/2000.in -p "C++/read-write-only.out" -n 2
Executing: C++/read-write-only.out -i Testing/2000.in > TMPcurrentFileResult.out
It took 8 seconds for 2 executions
MIN: 4 seconds
MAX: 4 seconds

$ ./test.sh -i Testing/2000.in -p "python Python/read-write-only.py" -n 2
Executing: python Python/read-write-only.py -i Testing/2000.in > TMP..Results.out
It took 16 seconds for 2 executions
MIN: 8 seconds
MAX: 8 seconds

This is the code I've used for Python:

#!/usr/bin/python
# -*- coding: utf-8 -*-

from optparse import OptionParser
parser = OptionParser()
parser.add_option("-i", dest="filename", default="bigMatrix.in",
     help="input file with two matrices", metavar="FILE")
(options, args) = parser.parse_args()

def read(filename):
    lines = open(filename, 'r').read().splitlines()
    A = []
    B = []
    matrix = A
    for line in lines:
        if line != "":
            matrix.append(map(int, line.split("\t")))
        else:
            matrix = B
    return A, B

def printMatrix(matrix):
    for line in matrix:
        print "\t".join(map(str,line))

A, B = read(options.filename)
# Do something
printMatrix(B)

This is the C++-Code

#include <sstream>
#include <string>
#include <fstream>
#include <iostream>
#include <vector>
#include <algorithm>

using namespace std;

int getMatrixSize(string filename) {
    string line;
    ifstream infile;
    infile.open (filename.c_str());
    getline(infile, line);
    return count(line.begin(), line.end(), '\t') + 1;
}

void read(string filename, vector< vector<int> > &A, vector< vector<int> > &B){
    string line;
    FILE* matrixfile = freopen(filename.c_str(), "r", stdin);

    int i = 0, j, a;
    while (getline(cin, line) && !line.empty()) {
        istringstream iss(line);
        j = 0;
        while (iss >> a) {
            A[i][j] = a;
            j++;
        }
        i++;
    }

    i = 0;
    while (getline(cin, line)) {
        istringstream iss(line);
        j = 0;
        while (iss >> a) {
            B[i][j] = a;
            j++;
        }
        i++;
    }

    fclose (matrixfile);
}

void printMatrix(vector< vector<int> > matrix, int n) {
    for (int i=0; i < n; i++) {
        for (int j=0; j < n; j++) {
            if (j != 0) {
                cout << "\t";
            }
            cout << matrix[i][j];
        }
        cout << endl;
    }
}

int main (int argc, char* argv[]) {
    string filename;
    if (argc < 3) {
        filename = "bigMatrix.in";
    } else {
        filename = argv[2];
    }

    int n = getMatrixSize(filename);
    vector<int> inner (n);
    vector< vector<int> > A(n, inner), B(n, inner), C(n, inner);
    read (filename, A, B);
    // do something with the matrices
    printMatrix(C, n);
    return 0;
}

Is it possible to get Python as fast as C++ for I/O? How could I improve the I/O for Python / C++?

(I've heard scanf should be faster than cin. Why should it be faster?)

This is the GIT-Repository with all Code.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • 11
    Why all the matrix stuff if you're asking about I/O? I would do a pure I/O test. – chris Jun 29 '12 at 09:03
  • 3
    They are not the same code. I didn't take a deep look but at first sight in C++ you allocate the matrix with the right size, in Python you don't. For a very small matrix it shouldn't be a problem but when it increases... – Adriano Repetti Jun 29 '12 at 09:05
  • 2
    To read as: this isn't a good test if you want to focus on I/O because MOST of time will be spent doing something else (memory allocation, parsing, startup). I do not think for **pure I/O** you'll notice any difference with any language you'll use (at least in normal conditions, if you do not want to do overlapped I/O) – Adriano Repetti Jun 29 '12 at 09:07
  • 1
    Isn't it normal for a program written in an interpreted language to be slower than a program written in compiled language? Assembler I/O would be even faster (only if you mastered it, because compile-time optimizations in C++ compilers are good). – Niemand Jun 29 '12 at 09:08
  • You may check cin vs scanf in this topic: http://stackoverflow.com/questions/1042110/using-scanf-in-c-programs-is-faster-than-using-cin.. – emreakyilmaz Jun 29 '12 at 09:09
  • 3
    @Niemand *usually* working with I/O the **slowest part** isn't the program but the **disk**. – Adriano Repetti Jun 29 '12 at 09:09
  • @Adriano Initializing the list didn't make it much faster. I'll upload the code later. – Martin Thoma Jun 29 '12 at 09:18
  • Considering Python's File I/O is written in C I would have to say the performance issues you're having would be in the rest of your implementation, and not in the pure I/O portion of it. – Christian Witts Jun 29 '12 at 09:19
  • 6
    @moose try removing **everything** but pure I/O. Read and **do not parse** (skip matrix code too). Do not print output. Run the test "n" times (and much more than 2) but using a loop inside the main (not running the program again and again with the shell script). You'll be surprised. The algorithm you use for parsing is different too. **Measure** performance isn't easy and comparison is even harder but at least you should remove all the _noise_ and artifacts and...use the **same algorithm**. If you don't then you'll measure how good you're to write in a language or in another. – Adriano Repetti Jun 29 '12 at 09:24
  • 3
    Beware when testing I/O performance, that if you run the test `n` times that might mean you've tested the I/O "cold" once, and "hot" `n-1` times. For large enough `n`, cold I/O performance becomes irrelevant to the benchmark, but in practical use you probably care about it quite a lot. It's better to test one extremely large file, than to test a fairly large file lots of times. Unless you know how to outwit your OS and hardware's disk caching, that is, in which case you can precisely test cold and hot I/O separately. "Extremely large" means "much larger than you have RAM". – Steve Jessop Jun 29 '12 at 10:01
  • 1
    @Adriano: I've just tested I/O with this line of python only: `lines = open("../Testing/2000.in", 'r').read() print lines`. You were on the right track. I/O wasn't the slow part. Parsing the input was slow. Thanks. If you post something like this I'll accept your answer. – Martin Thoma Jun 30 '12 at 08:38
  • @moose happy to help. Thank you but do not worry for the answer, "try this and that" isn't a true answer. – Adriano Repetti Jun 30 '12 at 09:04

3 Answers3

1

It takes a while to start the python interpreter. Take this into consideration when running your tests.

Gabi
  • 600
  • 2
  • 13
  • 4
    It doesn't take 8 seconds though, or even a significant fraction of 8 seconds. On my system it takes 0.1 seconds to start up and shut down the interpreter, and another 0.1 if I import `optparse`. The questioner's Python code needs compiling at runtime too, which will add another little bit. The time to take this into consideration is when the C++ and Python code are within a second of each other. – Steve Jessop Jun 29 '12 at 10:04
0

When you are not mixing C and C++ file manipulation routines, you should turn off the synchronization with stdio.

http://www.cplusplus.com/reference/iostream/ios_base/sync_with_stdio/

cin and cout should be generally faster then their C counterparts (with synchronization turned off).

As far as the slowness of Python goes, well, why don't you check the implementation of the I/O functions?

Šimon Tóth
  • 35,456
  • 20
  • 106
  • 151
-1

You must buffer I/O in any language.

And why are you comparing Python and C++?

Python is an interpreted language, while C++ is compiled.

maxwell
  • 857
  • 8
  • 16
  • 3
    [This question](http://stackoverflow.com/questions/701180/is-there-an-un-buffered-i-o-in-windows-system) asks about unbuffered I/O. You might want to take a look. – chris Jun 29 '12 at 09:10