I'm trying to read a text file full of Twitter Screen Names and store them in a database. ScreenNames can't be more than 15 characters so one of my checks ensures that the name isn't more than 15 characters.
I've found something really strange going on when I try to upload AmericanExpress.
This is my text file contents:
americanexpress
AmericanExpress
AMERICANEXPRESS
And this is my code:
var names = new List<string>();
var badNames = new List<string>();
using (StreamReader reader = new StreamReader(file.InputStream, Encoding.UTF8))
{
string line;
while (!reader.EndOfStream)
{
line = reader.ReadLine();
var name = line.ToLower().Trim();
Debug.WriteLine(line + " " + line.Length + " " + name + " " + name.Length);
if (name.Length > 15 || string.IsNullOrWhiteSpace(name))
{
badNames.Add(name);
continue;
}
if (names.Contains(name))
{
continue;
}
names.Add(name);
}
}
The first americanexpress passes the under 15 length test, the second fails, and the third passes. When I debug the code and hover over name during the second loop for AmericanExpress, this is what I get:
And this is Debug output:
americanexpress 15 americanexpress 15
AmericanExpress 16 americanexpress 16
AMERICANEXPRESS 15 americanexpress 15
I've counted the characters in AmericanExpress at least 10 times, and I'm pretty sure it's only 15 character.
Does anyone have any idea why Visual Studio is telling me americanexpress.Length = 16?
SOLUTION
name = Regex.Replace(name, @"[^\u0000-\u007F]", string.Empty);