4

So I was trying to read a csv file using c++ and do some calculation and output to another csv file. Everything works fine but when the program reads a line :

<a href="http://www.google.com" target="_blank">google</a>

and I want to see what the program has read so I cout that string, and it shows:

<a href=""http://www.google.com"" target=""_blank"">google</a>

Basically it doubles every double quotation marks? How can I solve this?

Edits:

Here's my code:

int main() 
{
    ifstream read;
    ofstream write;
    string line;
    string cell;
    int col = 0;
    string temp;
    string links;
    read.open("Book1.csv");
    write.open("output.csv");
    if (read.is_open())
    {
        cout << "opened" <<endl ;
        getline(read, line);
        while(getline(read,temp))
        {
            stringstream line(temp);
            while (getline(line, cell, ','))
            {
                if (col > 9)
                {
                    links.pop_back();
                    write << links<<endl;
                    col = 0;
                    links = "";
                    break;
                }
                else
                {
                    if (cell != "")
                    {
                        if (col == 0)
                        {
                            write << cell<<',';
                        }
                        else if (col == 1)
                        {
                            write << cell<<',';
                        }
                            else
                    {
                            cell.erase(0, 1);
                            cell.pop_back();
                            links += cell;

                            links += '/';
                        }
                        cout << cell << endl;
                    }
                    col += 1;
                }
            }
        }       
    }
    else 
    {
        cout << "failed" << endl;
    }       
    read.close();
    write.close();  
}
andyz
  • 55
  • 10
  • 1
    How are you reading from the file. This won't happen "by accident" using standard library calls. See: http://ideone.com/j3jJrO for an example. – Chad Feb 09 '16 at 19:21
  • @Chad I used getline and stringstream. Oh by the way how can I make those grey-backgrounded in comments?? I'm kinda new to SO – andyz Feb 09 '16 at 19:24
  • Cannot duplicate: http://ideone.com/SX4272 – PaulMcKenzie Feb 09 '16 at 19:26
  • @PaulMcKenzie I actually read from a csv file – andyz Feb 09 '16 at 19:30
  • @andyz So give us a line of data. The only difference with my example is that the "file" is `std::cin`. – PaulMcKenzie Feb 09 '16 at 19:31
  • As Chad and PaulMcKenzie showed with their examples, reading from a file with getline() normally shouldn't cause quotation marks to get doubled. You'll need show the exact code you're using to read the file (i.e. put it in your question) if you want more help debugging it. – Edward Feb 09 '16 at 19:33
  • Might be able to find useful info [here](http://stackoverflow.com/q/1120140/509868) – anatolyg Feb 09 '16 at 20:08

1 Answers1

3

This is perfectly normal. The quotes inside the field (inside your csv file) are escaped with another quote to generate valid csv.

Consider this csv data:

123,"monitor 27"", Samsung",456

Since the second field contains a , it needs to be quoted. But because there are quotes inside the field, those need to be escaped with another quote.

So, it is not the reading that add's the extra quotes, they are already inside your csv (but a csv viewer will only show one quote after parsing).

If you are outputting this string to another csv you can (need to) leave the double quotes, just make sure the whole field is surrounded by quotes too.


Update (after posting the code):

First, I'll assume that the second string you posted was also surrounded with quotes like this:

"<a href=""http://www.google.com"" target=""_blank"">google</a>"

Otherwise you would have invalid csv data.

To parse csv, we cannot just split on each , because there could be one inside a field.

Let's say we have the following fields:

123
monitor 27", Samsung
456

To write those to a valid csv row, the second field has to be surrounded with quotes because there is a comma inside. If there are quotes inside a quoted field, those need to be escaped with another quote. So we get this:

123,"monitor 27"", Samsung",456

Without the second quote after 27" the csv would be invalid and unparsable.

To correctly scan a csv row, you need to check every byte. Here's some pseudo code which will also make clear why there have to be 2 quotes (assuming there are no multiline fields):

read a line

bool bInsideQuotes = false

loop over chars
  if character == '"'
    bInsideQuotes = !bInsideQuotes
  if character == ',' and !bInsideQuotes
    found a field separator

That way you skip the , inside a field. Now it's also easy to see why quotes inside a field need to be escaped with an extra quote: bInsideQuotes becomes false at 27", and the second quote (27"") forces bInsideQuotes to become true again (we're still inside a field).

Now, to write back that original string you don't have to change a thing. Just write it to the second file as you read it from the original file, and your csv will remain valid.

To use the string, remove the 2 outer quotes and replace every 2 quotes with 1 quote.

Danny_ds
  • 11,201
  • 1
  • 24
  • 46
  • I output that to a csv file and it still have 2 double quotes? – andyz Feb 09 '16 at 19:56
  • @andyz - Yes, if the fields are surrounded with quotes, the quotes inside need to be doubled: `1253,"google",456`. But don't forget the outside quotes too. (perhaps take a look at your original csv file in notepad to see how everything is quoted) – Danny_ds Feb 09 '16 at 19:59
  • Then how can I delete the quotes I don't need? ie same format as the input – andyz Feb 09 '16 at 20:03
  • @andyz - If you write that string back to another csv, you need those quotes there too. Also, when reading, make sure you don't split on `,` inside quoted fields. I'll update my answer in 15min (got called away here). – Danny_ds Feb 09 '16 at 20:07