0

So I'm using C# and Visual Studio. I am reading a file of students and their information. The number of students is variable, but I want to grab their information. At the moment I just want to segment the student's information based off of the string "Student ID" because each student's section starts with Student ID. I'm using ReadAllText and setting it equal to a string and then feeding that string to my function splittingStrings. The file will look like this:

student ID 1
//bunch of info

student ID 2 
//bunch of info

student ID 3 
//bunch of info
.
.
.

I'm wanting to split each segment into a list since the number of students will be unknown, and the information for each student will vary. So I looked into both Regular string split and Regex string splitting. For regular strings I tried this.

        public static List<string> StartParse = new List<string>(); 

        public static void splittingStrings(string v)
        {
            string[] DiagDelimiters = new string[] {"Student ID "};

            StartParse.Add(v.Split(DiagDelimiters, StringSplitOptions.None);   
        }

And this is what I tried with Regex:

StartParse.Add(Regex.Split("Student ID ");

I haven't used Lists before, but from what I've read they are dynamic and easy to use. My only trouble I'm getting is that all examples I see with split are in combination with an array so syntactically I'm not sure how to do a split on a string and insert it into a list. For output my goal is to have the student segments divided so that if I need to I can call a particular segment later.


Let me verify that I'm after that batch of information not the ID's alone. A lot of the questions seem to be focused on that so I felt I needed to verify that.

To those suggesting other storage bodies:

example of what list will hold:

position 0 will hold [<id> //bunch of info] 
position 1 will hold [<anotherID> //bunch of info]
.
.
.

So I'm just using the List to do multiple operations on for information that I need. The information will be FAR more manageable if I can segment them into the list as shown above. I'm aware of dictionaries, but I have to store this information either in sql tables or inside text files depending on the contents of the segments. An example would be if one segment was really funky then I would send an error report that one student's information is bad. Otherwise insert neccessary information into sql table. But I'm having to work with multiple things from the segments so I felt the List was the best way to go since I'll have to also go back and forth in the segment to cross check bits of information with earlier things in that segment I found.

user3003304
  • 288
  • 1
  • 6
  • 18
  • `DiagDelimiters` is just 1 string, you don't need to make an array for it. Then `foreach (var student in v.Split(DiagDelimiters)) { StartParse.Add(student) }` – Ilan Apr 11 '14 at 16:06
  • he actually [does need the array](http://stackoverflow.com/questions/2245442/c-sharp-split-a-string-by-another-string) – Jonesopolis Apr 11 '14 at 16:07
  • Is your end goal to simply dump all the text for each `Student` into a `List`? Do you want to maintain the Student ID's? – Jonesopolis Apr 11 '14 at 16:13
  • @Jonesy I'm wanting the ID's and the //bunch of info after that. I don't need the string "Student ID" so I figured I could let the splitter eat it since I don't think there's a way to preserve the character(s) given as the split on part of the string or regex split. – user3003304 Apr 11 '14 at 16:33
  • @LLan321 I got compiler errors saying that it needed a string array. – user3003304 Apr 11 '14 at 16:34
  • I'd go with a Dictionary so you can easily reference the `StudentID` by key – Jonesopolis Apr 11 '14 at 18:20
  • I'm eventually going to take the information I find and insert it into a sql table, but I need to get information from the segments. There's stuff near the beginning and the end of the segments that correlate, and that's why I want to store them into a list first. The "bunch of information part" They're similar, but not the same to the point where I can say "oh this bit of information will always be there" cause some students may have some information that others don't. – user3003304 Apr 11 '14 at 19:21

4 Answers4

1

There is no need to use RegEx here and I would recommend against it. Simply splitting on white space will do the trick. Lets pretend you have a list which contains each of those lines (student ID 1, student ID 2, ect) you can get a list of the id's very simply like so;

  List<string> ids = students.Select(x => x.Split(' ')[2]).ToList();

The statement above essentially says, for each string in students split the string and return the third token (index 2 because it's 0 indexed). I then call ToList because Select by default returns an IEnumerable<T> but I wouldn't worry about those details just yet. If you don't have a list with each of the lines you showed the idea stays much the same, only you would add the items to you ids list one by one as you split the string. For an given string in the form of student id x I would get x on it's own with myString.Split(' ')[2] that is the basis of the expression I pass into Select.

Based on the OP's comment here is a way to get all of the data without the Student Id part of each batch.

string[] batches = input.Split(new string[] { "student id " } StringSplitOptions.RemoveEmptyEntries);

If you really need a list then you can just call ToList() and change type of batches to List<string> but that would probably just be a waste of CPU cycles.

evanmcdonnal
  • 46,131
  • 16
  • 104
  • 115
  • I'm wanting the batch information. I denoted it with //bunch of info in my question since how much of how little of that is unknown. If I split on "Student ID" then the actual ID will still be in that batch. I'm just losing the string "Student ID". But I don't need to create a list of just the Student ID's. So I don't believe splitting on the spaces would help my case. – user3003304 Apr 11 '14 at 16:23
  • @user3003304 Ok, I think you're generally going about this the wrong way but I will edit with a solution that will give you the id and the data beneath it which you excluded. – evanmcdonnal Apr 11 '14 at 16:28
0

Here's some pseudo-code, and what i'd do:

List<Integer> ids;

void ParseStudentId(string str) {
  var spl = str.split(" ");
  ids.add(Integer.parseInt(spl[spl.length-1])); // this will fetch "1" from "Student Id 1"
}

void main() {
  ParseStudentId("Student Id 1");
  ParseStudentId("Student Id 2");
  ParseStudentId("Student Id 3");

  foreach ( int id in ids )
    Console.WriteLin(id); // will result in:
                          // 1
                          // 2
                          // 3
}

forgive me. i'm a java programmer, so i'm mixing Pascal with camel casing :)

ddavison
  • 28,221
  • 15
  • 85
  • 110
0

Try this one:

StartParse = new List<string>(Regex.Split(v, @"(?<!^)(?=student ID \d+)"));

(?<!^)(?=student ID \d+) which means Splitting the string at the point student ID but its not at the beginning of the string.

Sabuj Hassan
  • 38,281
  • 14
  • 75
  • 85
0

Check this code

    public List<string> GetStudents(string filename)
    {
        List<string> students = new List<string>();
        StringBuilder builder = new StringBuilder();
        using (StreamReader reader = new StreamReader(filename)){
            string line = "";
            while (!reader.EndOfStream)
            {
                line  = reader.ReadLine();
                if (line.StartsWith("student ID") && builder.Length > 0)
                {
                    students.Add(builder.ToString());
                    builder.Clear();
                    builder.Append(line);
                    continue;
                }

                builder.Append(line);
            }

            if (builder.Length > 0)
                students.Add(builder.ToString());
        }

        return students;
    }
yazan
  • 600
  • 7
  • 12