5

I have comma delimited strings I need to pull values from. The problem is these strings will never be a fixed size. So I decided to iterate through the groups of commas and read what is in between. In order to do that I made a function that returns every occurrence's position in a sample string.

Is this a smart way to do it? Is this considered bad code?

#include <string>
#include <iostream>
#include <vector>
#include <Windows.h>

using namespace std;

vector<int> findLocation(string sample, char findIt);

int main()
{
    string test = "19,,112456.0,a,34656";
    char findIt = ',';

    vector<int> results = findLocation(test,findIt);
    return 0;
}

vector<int> findLocation(string sample, char findIt)
{
    vector<int> characterLocations;
    for(int i =0; i < sample.size(); i++)
        if(sample[i] == findIt)
            characterLocations.push_back(sample[i]);

    return characterLocations;
}
John Dibling
  • 99,718
  • 31
  • 186
  • 324
lodkkx
  • 1,163
  • 2
  • 17
  • 29
  • 1
    To me this is perfect. Although there will be many C++ coders saying "why invent the wheel" and "use that function, don't write it yourself". Anyway, I don't care about them, I don't know about you. However, there is one minor problem with your code. `i` shouldn't reach `sample.length()`, so you should have `i < sample.length()` in your for loop condition – Shahbaz Oct 11 '11 at 13:30
  • Yup I just fixed that. Also, it needed to be .size() not length – lodkkx Oct 11 '11 at 13:32
  • If you are going to split the strings afterwards, you might want to take a look at [this question](http://stackoverflow.com/questions/236129/how-to-split-a-string-in-c). – Kurt Stutsman Oct 11 '11 at 13:33
  • @Shahbaz: You just proved the point about not "reinventing the wheel". Why risk introducing a bug into your system when a solution is available that has had bugs wrung out of it for years. – Michael Price Oct 11 '11 at 14:32

5 Answers5

14
vector<int> findLocation(string sample, char findIt)
{
    vector<int> characterLocations;
    for(int i =0; i < sample.size(); i++)
        if(sample[i] == findIt)
            characterLocations.push_back(sample[i]);

    return characterLocations;
}

As currently written, this will simply return a vector containing the int representations of the characters themselves, not their positions, which is what you really want, if I read your question correctly.

Replace this line:

characterLocations.push_back(sample[i]);

with this line:

characterLocations.push_back(i);

And that should give you the vector you want.

Shahbaz
  • 46,337
  • 19
  • 116
  • 182
jwfriese
  • 228
  • 2
  • 9
6

If I were reviewing this, I would see this and assume that what you're really trying to do is tokenize a string, and there's already good ways to do that.

Best way I've seen to do this is with boost::tokenizer. It lets you specify how the string is delimited and then gives you a nice iterator interface to iterate through each value.

using namespace boost;
string sample = "Hello,My,Name,Is,Doug";
escaped_list_seperator<char> sep("" /*escape char*/, ","/*seperator*/, "" /*quotes*/)

tokenizer<escaped_list_seperator<char> > myTokens(sample, sep)

//iterate through the contents
for (tokenizer<escaped_list_seperator<char>>::iterator iter = myTokens.begin();
     iter != myTokens.end();
     ++iter)
{
    std::cout << *iter << std::endl;
}

Output:

Hello
My
Name
Is
Doug

Edit If you don't want a dependency on boost, you can also use getline with an istringstream as in this answer. To copy somewhat from that answer:

std::string str = "Hello,My,Name,Is,Doug";
std::istringstream stream(str);
std::string tok1;

while (stream)
{
    std::getline(stream, tok1, ',');
    std::cout << tok1 << std::endl;
}

Output:

 Hello
 My
 Name
 Is
 Doug

This may not be directly what you're asking but I think it gets at your overall problem you're trying to solve.

Community
  • 1
  • 1
Doug T.
  • 64,223
  • 27
  • 138
  • 202
  • So is it better to use boost and other third party libraries when size constraints don't really matter? – lodkkx Oct 11 '11 at 13:39
  • `myTkoens` should be `myTokens`. `>>` should be `> >` (unless OP is using C++11). I can't test this code, but do you know how are two or more consecutive separator characters handled? – darioo Oct 11 '11 at 13:40
  • Basically I need two values out of a 12 item csv string. Would it not be easier to use my function and just assign the strings to the substring of the indexes my function returns? – lodkkx Oct 11 '11 at 13:43
  • @chronoz Maybe, depends if you might need more than just the two items eventually. Besides the code to solve the general problem is IMO simpler cause its already done for you. Also -- I added an example of using `std::getline` which is another idiomatic way of tokenizing that doesn't rely on boost. – Doug T. Oct 11 '11 at 13:49
  • 1
    @DougT. You just look at those two pieces of code and dare tell me boost looks better!!!! – Shahbaz Oct 11 '11 at 14:02
  • 1
    When I run the code snippet with `getline()`, I am getting the last word twice. It prints these six words: Hello My Name Is Doug Doug. – Sadman Sakib Jul 13 '22 at 10:29
  • there is solution for the bug in this answer. the last word printed twice. it is because of getline reach to end of the file and tok is still the same. https://stackoverflow.com/questions/57458608/stdgetline-read-the-last-string-twice – Azzurro94 Feb 04 '23 at 20:37
0

Looks good to me too, one comment is with the naming of your variables and types. You call the vector you are going to return characterLocations which is of type int when really you are pushing back the character itself (which is type char) not its location. I am not sure what the greater application is for, but I think it would make more sense to pass back the locations. Or do a more cookie cutter string tokenize.

Jordan
  • 4,928
  • 4
  • 26
  • 39
0

Well if your purpose is to find the indices of occurrences the following code will be more efficient as in c++ giving objects as parameters causes the objects to be copied which is insecure and also less efficient. Especially returning a vector is the worst possible practice in this case that's why giving it as a argument reference will be much better.

#include <string>
#include <iostream>
#include <vector>
#include <Windows.h>

using namespace std;

vector<int> findLocation(string sample, char findIt);

int main()
{

    string test = "19,,112456.0,a,34656";
    char findIt = ',';

    vector<int> results;
    findLocation(test,findIt, results);
    return 0;
}

void findLocation(const string& sample, const char findIt, vector<int>& resultList)
{
    const int sz = sample.size();

    for(int i =0; i < sz; i++)
    {
        if(sample[i] == findIt)
        {
            resultList.push_back(i);
        }
    }
}
hevi
  • 2,432
  • 1
  • 32
  • 51
  • Why is returning a vector bad? – lodkkx Oct 11 '11 at 13:52
  • 1
    @chronoz: Actually, in this case returning the vector is not that bad, because return-value optimization will optimize away the (potentially expensive) copy. However, the STL-way would be to write the results to an output-iterator whose type is a template-parameter. Thus, the user of the function can choose in which representation he wants the results. – Björn Pollex Oct 11 '11 at 14:16
  • 1
    Dear chronoz, in c++ local variables are initiated from stackspace. when a function ends, its stackspace is being destroyed (in ur case characterLocations) for that reason c++ returns a copy of the returned value so the variable will remain valid. In this case compiler may (or may not depending on compile options) copy all the vector which obviously slows your solution down. Dear Bjorn Pollex you are right (especially stl way is really a good way for that problem) but in my opinion it is better not to rely on compiler behavior but to your own code. – hevi Oct 11 '11 at 14:52
-1

How smart it is also depends on what you do with those subtstrings delimited with commas. In some cases it may be better (e.g. faster, with smaller memory requirements) to avoid searching and splitting and just parse and process the string at the same time, possibly using a state machine.

Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180