How to extract data from a line which has fields separated by '|' character in C++?

Question

I have data in the following format in a text file. Filename - empdata.txt Note that there are no blank space between the lines.

Sl|EmployeeID|Name|Department|Band|Location

1|327427|Brock Mcneil|Research and Development|U2|Pune

2|310456|Acton Golden|Advertising|P3|Hyderabad

3|305540|Hollee Camacho|Payroll|U3|Bangalore

4|218801|Simone Myers|Public Relations|U3|Pune

5|144051|Eaton Benson|Advertising|P1|Chennai

I have a class like this

class empdata
{
public:
int sl,empNO;
char name[20],department[20],band[3],location[20];
};

I created an array of objects of class empdata. How to read the data from the file which has n lines of data in the above specified format and store them to the array of (class)objects created?

This is my code

int main () {
string line;
ifstream myfile ("empdata.txt");
for(int i=0;i<10;i++) //processing only first 10 lines of the file
{
    getline (myfile,line);
    //What should I do with this "line" so that I can extract data 
    //from this line and store it in the class object?             
     
}

  return 0;
}

So basically my question is how to extract data from a string which has data separated by '|' character and store each data to a separate variable

I need an idea on how to proceed. Have been trying to figure out for hours. Please help — Anish Kumar, Jul 27 '15 at 10:48
it seems you are trying with C , i put example with JAVA, then need to give another example — Asraful, Jul 27 '15 at 11:05
@AnishKumar do you also have to consider missing data or we are talking about a perfect data set here? — rbaleksandar, Jul 27 '15 at 14:48

DannyK · Answer 1 · 2017-03-07T21:45:31.827

I prefer to use the String Toolkit. The String Toolkit will take care of converting the numbers as it parses.

Here is how I would solve it.

#include <fstream>
#include <strtk.hpp>   // http://www.partow.net/programming/strtk

using namespace std;

// using strings instead of character arrays
class Employee
{
    public:
    int index;
    int employee_number;
    std::string name;
    std::string department;
    std::string band;
    std::string location;
};


std::string filename("empdata.txt");

// assuming the file is text
std::fstream fs;
fs.open(filename.c_str(), std::ios::in);

if(fs.fail())  return false;   

const char *whitespace    = " \t\r\n\f";

const char *delimiter    = "|";

std::vector<Employee> employee_data;

// process each line in turn
while( std::getline(fs, line ) )
{

// removing leading and trailing whitespace
// can prevent parsing problemsfrom different line endings.

    strtk::remove_leading_trailing(whitespace, line);


    // strtk::parse combines multiple delimeters in these cases

    Employee e;

    if( strtk::parse(line, delimiter, e.index, e.employee_number, e.name, e.department, e.band, e.location) )
    {
         std::cout << "succeed" << std::endl;
     employee_data.push_back( e );
    }

}

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

AFAIK, there is nothing that does it out of the box. But you have all the tools to build it yourself

The C way

You read the lines into a char * (with cin.getline()) and then use strtok, and strcpy

The getline way

The getline function accept a third parameter to specify a delimiter. You can make use of that to split the line through a istringstream. Something like :

int main() {
    std::string line, temp;
    std::ifstream myfile("file.txt");
    std::getline(myfile, line);
    while (myfile.good()) {
        empdata data;
        std::getline(myfile, line);
        if (myfile.eof()) {
            break;
        }
        std::istringstream istr(line);
        std::getline(istr, temp, '|');
        data.sl = ::strtol(temp.c_str(), NULL, 10);
        std::getline(istr, temp, '|');
        data.empNO = ::strtol(temp.c_str(), NULL, 10);
        istr.getline(data.name, sizeof(data.name), '|');
        istr.getline(data.department, sizeof(data.department), '|');
        istr.getline(data.band, sizeof(data.band), '|');
        istr.getline(data.location, sizeof(data.location), '|');
    }
    return 0;
}

This is the C++ version of the previous one

The find way

You read the lines into a string (as you currently do) and use string::find(char sep, size_t pos) to find next occurence of the separator and copy the data (from string::c_str()) between start of substring and separator to your fields

The manual way

You just iterate the string. If the character is a separator, you put a NULL at the end of current field and pass to next field. Else, you just write the character in current position of current field.

Which to choose ?

If you are more used to one of them, stick to it.

Following is just my opinion.

The getline way will be the simplest to code and to maintain.

The find way is mid level. It is still at a rather high level and avoids the usage of istringstream.

The manual way will be really low level, so you should structure it to make it maintainable. For example your could a explicit description of the lines as an array of fields with a maximimum size and current position. And as you have both int and char[] fields it will be tricky. But you can easily configure it the way you want. For example, your code only allow 20 characters for department field, whereas Research and Development in line 2 is longer. Without special processing, the getline way will leave the istringstream in bad state and will not read anything more. And even if you clear the state, you will be badly positionned. So you should first read into a std::string and then copy the beginning to the char * field.

Here is a working manual implementation :

class Field {
public:
    virtual void reset() = 0;
    virtual void add(empdata& data, char c) = 0;
};

class IField: public Field {
private:
    int (empdata::*data_field);
    bool ok;

public:
    IField(int (empdata::*field)): data_field(field) {
        ok = true;
        reset();
    }
    void reset() { ok = true; }
    void add(empdata& data, char c);
};

void IField::add(empdata& data, char c) {
    if (ok) {
        if ((c >= '0') && (c <= '9')) {
            data.*data_field = data.*data_field * 10  + (c - '0');
        }
        else {
            ok = false;
        }
    }
}


class CField: public Field {
private:
    char (empdata::*data_field);
    size_t current_pos;
    size_t size;

public:
    CField(char (empdata::*field), size_t size): data_field(field), size(size) {
        reset();
    }
    void reset() { current_pos = 0; }
    void add(empdata& data, char c);
};

void CField::add(empdata& data, char c) {
    if (current_pos < size) {
        char *ix = &(data.*data_field);
        ix[current_pos ++] = c;
        if (current_pos == size) {
            ix[size -1] = '\0';
            current_pos +=1;
        }
    }
}

int main() {
    std::string line, temp;
    std::ifstream myfile("file.txt");
    Field* fields[] = {
        new IField(&empdata::sl),
        new IField(&empdata::empNO),
        new CField(reinterpret_cast<char empdata::*>(&empdata::name), 20),
        new CField(reinterpret_cast<char empdata::*>(&empdata::department), 20),
        new CField(reinterpret_cast<char empdata::*>(&empdata::band), 3),
        new CField(reinterpret_cast<char empdata::*>(&empdata::location), 20),
        NULL
    };
    std::getline(myfile, line);
    while (myfile.good()) {
        Field** f = fields;
        empdata data = {0};
        std::getline(myfile, line);
        if (myfile.eof()) {
            break;
        }
        for (std::string::const_iterator it = line.begin(); it != line.end(); it++) {
            char c;
            c = *it;
            if (c == '|') {
                f += 1;
                if (*f == NULL) {
                    continue;
                }
                (*f)->reset();
            }
            else {
                (*f)->add(data, c);
            }
        }
        // do something with data ...
    }
    for(Field** f = fields; *f != NULL; f++) {
        free(*f);
    }
    return 0;
}

It is directly robust, efficient and maintainable : adding a field is easy, and it is tolerant to errors in input file. But it is way loooonger than the other ones, and would need much more tests. So I would not advise to use it without special reasons (necessity to accept multiple separators, optional fields and dynamic order, ...)

score 0 · Answer 3 · edited Jul 28 '15 at 08:15

Try this simple code segment , this will read the file and , give a print , you can read line by line and later you can use that to process as you need .

Data : provided bu you : in file named data.txt.

package com.demo;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;

public class Demo {

    public static void main(String a[]) {
        try {
            File file = new File("data.txt");
            FileReader fileReader = new FileReader(file);
            BufferedReader bufferReader = new BufferedReader(fileReader);
            String data;

            while ((data = bufferReader.readLine()) != null) {
                // data = br.readLine( );
                System.out.println(data);
            }   

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

In console you will get output like this :

Sl|EmployeeID|Name|Department|Band|Location
1|327427|Brock Mcneil|Research and Development|U2|Pune
2|310456|Acton Golden|Advertising|P3|Hyderabad
3|305540|Hollee Camacho|Payroll|U3|Bangalore
4|218801|Simone Myers|Public Relations|U3|Pune
5|144051|Eaton Benson|Advertising|P1|Chennai

This is a simple idea, you may do what you need.

Basically my question is how to extract data from a string which has data separated by '|' character and store each data to a separate variable. I'm doing it in C — Anish Kumar, Jul 27 '15 at 11:13
@Forhad: Since the OP has now posted code, it becomes apparent that he's doing C++. So you should probably adapt your answer, or delete it, because a Java answer to a C++ question looks a bit out of place. ;-) — DevSolar, Jul 27 '15 at 14:46

score 0 · Answer 4 · edited Jul 28 '15 at 08:15

In C++ you can change the locale to add an extra character to the separator list of the current locale:

#include <locale>
#include <iostream>

struct pipe_is_space : std::ctype<char> {
  pipe_is_space() : std::ctype<char>(get_table()) {}
  static mask const* get_table()
  {
    static mask rc[table_size];
    rc['|'] = std::ctype_base::space;
    rc['\n'] = std::ctype_base::space;
    return &rc[0];
  }
};

int main() {
  using std::string;
  using std::cin;
  using std::locale;

  cin.imbue(locale(cin.getloc(), new pipe_is_space));

  string word;
  while(cin >> word) {
    std::cout << word << "\n";
  }
}