0

i have one column. In that column is saved the fullname(firstname, surname) of a Person. Sometimes it is saved like

Michael, Myers

and sometimes the name is saved like

Michael Myers

without a comma between them.

If i load the column and save it to a variable it looks of course like: Michael, Myers or Michael Myers.

The Question is: If i load the column from the database, how can i save the the firstname and the surname independently of each other in different variables.

Paks
  • 1,460
  • 6
  • 26
  • 46

4 Answers4

2

This is a data nightmare. I would strongly suggest correcting this at the source (use two separate columns) and fixing the data once if that is at all possible.

Assuming it is not (or not right now):

You will get most cases right by just trying to look for a comma and using branch logic based on the result. However, there are many edge cases to consider. The outline of one approach would be

string[] parts = theColumn.Trim().Split(',');
if (parts.Length == 1)
{
    // Find the last occurrence of ' ' and split first/last name based on that
    // People may have middle names entered e.g. Michael M. Myers
}
else if (parts.Length == 2)
{
    firstName = parts[1];
    lastName = parts[0];
}
else
{
    // Dealing with a more complex case like Myers, Jr., Michael
    // You will have to develop logic for such special cases that may
    // be in your data.
}

You will run into problems with the single-comma case with names like Michael Myers, Jr. or Michael Meyers, DDS. More complete logic would test for such cases.

You may run into cases of 2 (or more) commas in the Last, First format when there is some sort of name suffix. You will have to decide how much time to spend on sanitizing logic there. Personally I would tend to log all cases where that happened for a few months to evolve my logic.

I used a product several years back that did a good job pulling names out of free-form fields and sorting out what was the first name, last name, etc. Might be worth having a look at.

Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • Very thorough answer, but make sure to remove the spaces. You could use Trim(). Oh, also include some violence directed at the DB designer. – Nick Vaccaro Nov 19 '12 at 22:21
  • @Norla: Good point to trim left/right spaces. This is just an outline. There are products dedicated to getting this kind of thing right. The logic is non-trivial. – Eric J. Nov 19 '12 at 22:22
  • @Norla: Heh. Show me a RegEx that can completely validate valid and invalid emails according to the RFC, and still be somewhat human readable, and I'll agree with you :-) (Not that the question calls for that... but it's telling that a RegEx cannot do that). Otherwise, I'll argue that a rules engine approach is probably better for a full, highly-accurate implementation of this type of logic (you have to consider cultural issues, name suffixes, name titles, etc). http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address – Eric J. Nov 19 '12 at 22:25
  • I'd say that if you had time to work out and prove the regex's or write a rules engine, it would be better spent of sorting the crap data out myself. – Tony Hopkinson Nov 19 '12 at 22:42
1

What are the possible values of fullname? Given your example above, there's no good, consistent way to separate the two, other than something lame like:

string firstName;
string lastName;
if(fullname.Contains(","))
{
    string[] splitNames = fullName.Split(",");
    lastName = splitNames[0];
    firstName = splitNames[1];
}
else if(splitNames.Contains(" "))
{
    string[] splitNames = fullName.Split(" ");
    firstName = splitNames[0];
    lastName = splitNames[1];
}
else
{
    //Some other logic.
}
Dave Zych
  • 21,581
  • 7
  • 51
  • 66
  • 1
    It's an if...else if. It won't work on "Michael, Myers" either. – Nick Vaccaro Nov 19 '12 at 22:22
  • Right... that's why I said _Given your example above..._ – Dave Zych Nov 19 '12 at 22:25
  • Hi thanks for your answer. Possible Value of fullname is firstname and surname like in my example. Other examples would be "Kristin, Reinhardt" and so on. I will try that code and will give a feedback. THANKS – Paks Nov 19 '12 at 22:41
  • I tried your example and it works for these two possible values! – Paks Nov 20 '12 at 10:08
0

You can do a string split on ", ". Then another string split on " ". This will make sure you've covered both cases.

BUT, what you really want to do is hit your DBA with a large wooden object for stuffing two columns into one.

Nick Vaccaro
  • 5,428
  • 6
  • 38
  • 60
  • No got to disagree gain. Large metal object, preferably a mace, though a good flail can be fun. – Tony Hopkinson Nov 19 '12 at 22:44
  • yeah you are right, that is not the best way to save data. Iam new in that company and they have a huge data structure. If i would adjust that i have to change so many things in code and that would be a huge effort. – Paks Nov 19 '12 at 22:44
0

As others have said splitting on a comma is trivial. The real problem is you don't know how many more flavours are going to turn up. If it's only these two then

Following assumes sql server is your db, something like

Select 
Case 
 When CharIndex(',',[DaftNameField]) = 0 
 Then Substring([SomeNameField],CharIndex(',',[SomeNameField]),255)
else  Substring([SomeNameField],1, CharIndex(' ',[SomeNameField]) - 1)
end As Forename,
Case 
 When CharIndex(',',[SomeNameField]) = 0 
 Then Substring([SomeNameField],1 CharIndex(',',[SomeNameField]) - 1)
else  Substring([SomeNameField], CharIndex(' ',[SomeNameField]) + 1, 255)
end As Surname
From SomeTable

Off the top of my head this, but it's basically right.

As you can see already painful, add in middle names, honorifics and such, gets worse.

Oh and please tell me there's some sort of PersonID in this table.

Tony Hopkinson
  • 20,172
  • 3
  • 31
  • 39