1

I am working on a small export function where i need to write 1million lines consisting of 6x doubles. Unfortunately the tool that reads the data requires that the dots are replaced with commas. The way i convert them now is by replacing manually in an editor, which is cumbersome and extremely slow for a file that is about 20MB.

Is there a way to do this conversion while writing?

JavaCake
  • 4,075
  • 14
  • 62
  • 125
  • Sure. Read the data into (say) a string. Replace the dots with commas. Call a conversion routine such as `strtod` to convert it to a double. – Marshall Clow Dec 29 '12 at 15:35
  • So basically i put the entire line or each value into a string and replace the char and then write it to my file? Actually did not think of that. – JavaCake Dec 29 '12 at 15:37
  • Actually, I was thinking about doing it when you read the file, rather than write it - but if you control the writing of the file, then why not just write it out in the form that you want? – Marshall Clow Dec 29 '12 at 15:38
  • I only generate the file, so basically i can do whatever i want here. The plugin that reads i have no control over, it requires commas. – JavaCake Dec 29 '12 at 15:40
  • Ok - I misunderstood. So, you should generate a line of text into a string (instead of to the file), replace the dots with commas in the string, and write the string to the file. – Marshall Clow Dec 29 '12 at 15:40
  • @LightnessRacesinOrbit, C++ – JavaCake Dec 29 '12 at 15:40
  • Also an _export_ function usually doesn't _parse_ string lines; an import function would. – Lightness Races in Orbit Dec 29 '12 at 15:40
  • @LightnessRacesinOrbit, ofcourse, i have corrected the misleading title! – JavaCake Dec 29 '12 at 15:42
  • Converting a double to a string is covered here: http://stackoverflow.com/questions/1313988/c-what-is-the-optimal-way-to-convert-a-double-to-a-string?rq=1 – Marshall Clow Dec 29 '12 at 15:43
  • @JavaCake: Does `Is there a way to do this conversion while parsing?` now need correcting, also? – Lightness Races in Orbit Dec 29 '12 at 17:03
  • @LightnessRacesinOrbit, hopefully the last bits and pieces should be fixed! Thanks for noticing.. – JavaCake Dec 29 '12 at 17:04

2 Answers2

4

Using a tool like tr would be better than doing it manually, and should be your first choice. Otherwise, it's fairly simple to input through a filtering streambuf, which converts all '.' to ',', or even converts only in specific contexts (when the preceding or following character is a digit, for example). Without the context:

class DotsToCommaStreambuf : public std::streambuf
{
    std::streambuf* mySource;
    std::istream* myOwner;
    char myBuffer;
protected:
    int underflow()
    {
        int ch = mySource->sbumpc();
        if ( ch != traits_type::eof() ) {
            myBuffer = ch == '.' ? ',' : ch;
            setg( &myBuffer, &myBuffer, &myBuffer + 1 );
        }
    }
public:
    DotsToCommaStreambuf( std::streambuf* source )
        : mySource( source )
        , myOwner( NULL )
    {
    }
    DotsToCommaStreambuf( std::istream& stream )
        : mySource( stream.rdbuf() )
        , myOwner( &stream )
    {
        myOwner->rdbuf( this );
    }
    ~DotsToCommaStreambuf()
    {
        if ( myOwner != NULL ) {
            myOwner.rdbuf( mySource );
        }
    }
}

Just wrap your input source with this class:

DotsToCommaStreambuf s( myInput );

As long as s is in scope, myInput will convert all '.' that it sees in the input into ','.

EDIT:

I've since seen the comment that you want the change to occur when generating the file, rather than when reading it. The principle is the same, except that the filtering streambuf has an ostream owner, and overrides overflow( int ), rather than underflow. On output, you don't need the local buffer, so it's even simpler:

int overflow( int ch )
{
    return myDest->sputc( ch == '.' ? ',' : ch );
}
James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • 3
    Why not use a "locale" instead? – PRouleau Dec 29 '12 at 16:02
  • @PRouleau Good question. There is a distinct difference: my solution changes `'.'` to `','`. Universally and indiscriminately. A locale will use `','` instead of `'.'` when formatting floating point, and may make other changes as well. If the output consists of only floating point, both solutions are effectively equivalent. If the output contains other text, the behaviors are different. He doesn't really say which behavior he wants, but in _most_ cases, I suspect that locale would be the correct answer; the reason his target wants `','` is because it uses a specific locale. – James Kanze Dec 29 '12 at 18:04
  • @PRouleau And while I'm at it: it shouldn't be too hard to wrap the `imbue` in an RAII class, like I do for inserting the filtering streambuf. (I don't know why using a locale didn't occur to me; it is the obvious solution.) – James Kanze Dec 29 '12 at 18:07
0

I would make use of the c++ Algotithm library and use std::replace to get the work done. Read the entire file into a string and call replace:

std::string s = SOME_STR; //SOME_STR represents the set of data 
std::replace( s.begin(), s.end(), '.', ','); // replace all '.' to ','
Syntactic Fructose
  • 18,936
  • 23
  • 91
  • 177
  • Yikes! This hardly seems performant, nor particularly robust. – Lightness Races in Orbit Dec 29 '12 at 17:05
  • @LightnessRacesinOrbit If the data is already in memory, in the form of a string, it is probably as performant as anything else. – James Kanze Dec 29 '12 at 18:05
  • @JamesKanze: And consider disk seeks / cache hits etc, if we're really being picky. Walking the entire dataset twice is not as performant as walking it once. – Lightness Races in Orbit Dec 29 '12 at 18:12
  • @LightnessRacesinOrbit But unless the data set is extremely large, it's not very expensive either. If he already has the data as a string, and intends to output it in a single step, using `std::replace` is probably the best solution. If not, other solutions are probably more appropriate. There's nothing wrong with Need4Sleep's suggestion, which has the advantage of extreme simplicity. – James Kanze Dec 29 '12 at 22:49
  • @JamesKanze: 20MB is large-ish? – Lightness Races in Orbit Dec 29 '12 at 23:09
  • @LightnessRacesinOrbit It depends, but _if_ you have the data as a string already, it will probably be faster to use `std::replace` than it would be use a filtering streambuf (and it's too late to use locale). – James Kanze Dec 29 '12 at 23:15
  • @JamesKanze: Yep, if you do. This answer's code explicitly introduces a 20MB string copy, though. – Lightness Races in Orbit Dec 29 '12 at 23:17