3

I've got a file of blocks of strings, each which end with a certain keyword. I've currently got a stream reader setup which adds each line of the file to a list up until the end of the current block(line contains keyword indicating end of block).

listName.Add(lineFromFile);

Each block contains information e.g. Book bookName, Author AuthorName, Journal JournalName etc. So each block is hypothetically a single item (book, journal, conference etc)..

Now with around 50 or so blocks of information(items) i need some way to store the information so i can manipulate it and store each author(s), Title, pages etc. and know what information goes with what item etc.

While typing this I've come up with the idea of possibly storing each Item as an object of a class called 'Item', however with potentially several authors, I'm not sure how to achieve this, as i was thinking maybe using a counter to name a variable e.g.

int i = 0;
String Author[i] = "blahblah";
i++;

But as far as i know it's not allowed? So my question is basically what would be the simplest/easiest way to store each item so that i can manipulate the strings to store each item for use later.

@yamen here's an example of the file:

Author Bond, james
Author Smith John A
Year 1994
Title For beginners
Book Accounting
Editor Smith Joe
Editor Doe John
Publisher The University of Chicago Press
City Florida, USA
Pages 15-23
End

Author Faux, M
Author Sedge, M
Author McDreamy, L
Author Simbha, D
Year 2000
Title Medical advances in the modern world
Journal Canadian Journal of medicine
Volume 25
Pages 1-26
Issue 2
End


Author McFadden, B
Author Goodrem, G
Title Shape shifting dinosaurs
Conference Ted Vancouver
City Vancouver, Canada
Year 2012
Pages 2-6
End
ShaneL
  • 87
  • 4
  • 9

8 Answers8

4

Update in lieu of your sample

How to parse the string is beyond the scope of this answer - you might want to have a go at that yourself, and then ask another SO (I suggest reading the golden rules of SO: https://meta.stackexchange.com/questions/128548/what-stack-overflow-is-not).

So I'll present the solution assuming that you have a single string representing the full block of book/journal information (this data looks like citations). The main change from my original answer is that you have multiple authors. Also you might want to consider whether you want to transform the authors' names back to [first name/initial] [middle names] [surname].

I present two solutions - one using Dictionary and one using Linq. The Linq solution is a one-liner.

Define an Info class to store the item:

public class Info
{
   public string Title { get; private set; }
   public string BookOrJournal { get; private set; }
   public IEnumerable<string> Authors { get; private set; }
   //more members of pages, year etc.
   public Info(string stringFromFile)
   {
     Title = /*read book name from stringFromFile */;
     BookOrJournalName = /*read journal name from stringFromFile */;
     Authors = /*read authors from stringFromFile */;
   }
}

Note that the stringFromFile should be one block, including newlines, of citation information.

Now a dictionary to store each info by author:

Dictionary<string, List<Info>> infoByAuthor = 
  new Dictionary<string, List<Info>>(StringComparer.OrdinalIrgnoreCase);

Note the OrdinalIgnoreCase comparer - to handle situations where an author's name is printed in a different case.

Given a List<string> that you're adding to as per your listName.Add, this simple loop will do the trick:

List<Info> tempList;
Info tempInfo;
foreach(var line in listName)
{
  if(string.IsNullOrWhiteSpace(line))
    continue;
  tempInfo = new Info(line);
  foreach(var author in info.Authors)
  {
    if(!infoByAuthor.TryGetValue(author, out tempList))
      tempInfo[author] = tempList = new List<Info>();
    tempList.Add(tempInfo);
  }
}

Now you can iterate through the dictionary, and each KeyValuePair<string, List<Info>> will have a Key equal to the author name and the Value will be the list of Info objects that have that author. Note that the casing of the AuthorName will be preserved from the file even though you're grouping case-insensitively such that two items with "jon skeet" and "Jon Skeet" will be grouped into the same list, but their original cases will be preserved on the Info.

Also the code is written to ensure that only one Info instance is created per citation, this is preferable for many reasons (memory, centralised updates etc).

Alternatively, with Linq, you can simply do this:

var grouped = listName.Where(s => !string.IsNullOrWhiteSpace(s))
  .Select(s => new Info(s))
  .SelectMany(i => 
    s.Authors.Select(ia => new KeyValuePair<string, Info>(ia, i))
  .GroupBy(kvp => kvp.Key, kvp => kvp.Value, StringComparer.OrdinalIgnoreCase);

Now you have enumerable of groups, where the Key is the Author Name and the inner enumerable is all the Info objects with that author name. The same case-preserving behaviour regarding 'the two Skeets' will be observed here, too.

Community
  • 1
  • 1
Andras Zoltan
  • 41,961
  • 13
  • 104
  • 160
  • I added a sample file to the main post for more clarification, I'm now just researching about dictionaries to understand what your saying as I've never heard of data dictionaries and have never used a foreach loop for that matter... ;). You definitely seem to know your stuff though – ShaneL May 15 '12 at 06:48
  • @ShaneL okay have updated my answer - note I'm not going to tell you how to parse the string; that's too much like writing your whole program for you :) – Andras Zoltan May 15 '12 at 07:50
  • Thanks heaps, yeah I was only originally expecting an answer like: "Have a look at data dictionaries and/or you should create a class of info like you have shown. Greatly appreciate the nudge in this direction, hopefully help get me get past my 3 day programming block of trying to work out the answer to my problem :) – ShaneL May 16 '12 at 00:37
2

You can use a class with simple attributes like these:

class Book {
    string Title;
    int PageCount;
}

You can either initialize Book[] lines = Book[myFile.LineCount]; or maintain a List<Book>, but string[] is easier to access individual line numbers (lines[34] means 34'th book, and 34th line).

But basically a System.Data.DataTable may be better suited, because you have rows that contain multiple columns. With DataTable, you can access individual rows and access their columns by name.

Example:

DataTable dt = new DataTable();
DataTable.Columns.Add("bookName");

DataRow dr = dt.NewRow();
dr["bookName"] = "The Lost Island";
dt.Rows.Add(dr);

//You can access last row this way: 
dt.Rows[dt.Rows.Count-1]["bookName"].

One more good thing about a DataTable is that you can use grouping and summing on its rows like on an ordinary SQL table.

Edit: Initially my answer used structs but as @AndrasZoltan pointed out, it may be better to use classes when you're not sure what the application will evolve in.

Luka Ramishvili
  • 889
  • 1
  • 11
  • 20
  • A `struct` has no real application here. Also a List<> offers the same indexing as an array. – Andras Zoltan May 15 '12 at 05:37
  • Can you point out why not? Because it *has*. Array of structs. Is that hard to swallow? – Luka Ramishvili May 15 '12 at 05:38
  • You can use *both* classes and structs because OP doesn't need methods. – Luka Ramishvili May 15 '12 at 05:41
  • Sorry? A struct can have methods. Are you aware of what a [C# struct actually is](http://msdn.microsoft.com/en-us/library/ah19swz4(v=vs.100).aspx)? A struct declares a value type and is best suited to types that will contain nought but other value types. In this case everything is a string, so you're just creating a struct to hold a bunch of references. There is no point in doing that; a class is nearly always the correct approach except in rare circumstances. – Andras Zoltan May 15 '12 at 05:43
  • Great point; but nevertheless structs can be used with no problem. However, in the long term, class will be a better approach if the application grows. Before that, it's always tempting to bloat class code (and of course, it depends on the programmer). – Luka Ramishvili May 15 '12 at 05:49
  • 'structs can be used no problem' - what about if [it's mutable](http://stackoverflow.com/questions/441309/why-are-mutable-structs-evil) or need to be passed around a lot? A decision to use a struct doesn't hinge on how much code goes into it (microsoft suggest that in MSDN to make the decision simpler for people that don't want to know the full details) it's all about whether it's more efficient to keep the data contigious in memory. – Andras Zoltan May 15 '12 at 05:53
  • I considered OP's situation when saying that stucts can be used. In the situations you mentioned, or if the OP encounters similar situation, that's right, classes are a better approach. And he may save him a headache and choose it right away. – Luka Ramishvili May 15 '12 at 06:04
  • @Luka Ramishvili Hmm seeing a bunch of new terms that i've never seen before. e.g. DataTables. I'll have to research a bit to see if it's suitable for what i need. Basically need to manipulate the input file to create an intext and full bibliographical reference and display them in a listbox and a textbox respectively (along with some other features like clicking on each in-text reference displays it's full-text reference in the textbox etc) – ShaneL May 15 '12 at 06:54
  • @ShaneL I think your best bet is to write down your program structure on paper (or in a text document) and then build the best structure based on it. I think you should go with classes, and build a tiny API (like `public Book[] FindBooksOfAuthor(string AuthorName){...}` and `public Author AuthorOfBook(string BookName){..}`. I also suggest to move to a relational database (like MySQL) to store data, it will ease development. – Luka Ramishvili May 15 '12 at 07:55
  • @Luka Ramishvili Thanks for the help and advice, much appreciated – ShaneL May 16 '12 at 00:32
  • @ShaneL No problem, I'm happy if it helped even a little. – Luka Ramishvili May 16 '12 at 05:42
  • @AndrasZoltan (you edited fourth comment, answering to it) I guess not, I suspect I'm more used to C structs and talked from that perspective. I remember I've read that paper about C# structs earlier but apparently I didn't remember what it said :)) – Luka Ramishvili May 16 '12 at 05:53
  • @AndrasZoltan Probably, to express my point better, I didn't mean methods directly, but OO approach. But, as we agreed, in the long run classes are better and more extensible. – Luka Ramishvili May 16 '12 at 05:54
2

You should create a class Book

public class Book
 {
    public string Name { get; set; }
    public string Author { get; set; }
    public string Journal { get; set; }

 }

and maintain a List<Book>

var books = new List<Book>();
books.Add(new Book { Name = "BookName", Author = "Some Auther", Journal = "Journal" });
Asif Mushtaq
  • 13,010
  • 3
  • 33
  • 42
  • Your answer is semantically more elegant, but when some lines are blank, or lack some values, you gotta keep increasing constructor logic and adding `null`-s instead of blank lines. – Luka Ramishvili May 15 '12 at 05:33
2

I would use a multi value dictionary for this:

public struct BookInfo
    {
        public string Title;
        public string Journal;
    }

Then create a dictionary object:

var dict = new Dictionary<Author, BookInfo>();

This way, if you do run into multiple authors, the data will be sorted by author, which makes writing future code to work with this data easy. Printing out a list of all books under some author will be dead easy and not require a cumbersome search process.

  • This way, you restrict each author to only one book, which doesn't give you the ability to list *all books* because there's only one book. – Luka Ramishvili May 15 '12 at 05:37
2

You are well on your way to inventing the relational database. Conveniently, these are already available. In addition to solving the problem of storing relationships between entities, they also handle concurrency issues and are supported by modelling techniques founded in provable mathematics.


Parsers are a subject unto themselves. Since SQL is out of the question, this being a contrived university assignment, I do have some observations.

  • The easy way is with a regex. However this is extremely inefficient and a poor solution for large input files.
  • In the absence of regexes, String.IndexOf() and String.Split() are your friends.
  • If your assessor can't cope with SQL then LINQ is going to be quite a shock, but I really really like Zoltan's LINQ solution, it's just plain elegant.
Peter Wone
  • 17,965
  • 12
  • 82
  • 134
  • I thought about answering in a similar fashion: 'use SQL Express' or something like that. There's a good chance this file can just be pushed through an SSIS package. – Andras Zoltan May 15 '12 at 05:55
  • @Peter Wone, I think my wording of the question may have been a bit unclear, basically I am just asked to take the input file and create an in-text and full references. however the actual reference file isn't as nice as the one i provided and is full of errors etc. – ShaneL May 15 '12 at 06:41
  • So you are starting with a file created by some other system, and attempting to construct an object model from it? – Peter Wone May 15 '12 at 07:02
  • No basically just creating an application in which the user opens a file in the same format as the one above, and then from that the application turns each entry into an in-text style bibliographical reference to be displayed in a listbox and a full-reference to be displayed in a text-box. Then by using navigation arrows the application will cycle through the in-text references in the list box and display the corresponding full reference in the textbox. Hope it's a bit clearer. – ShaneL May 15 '12 at 07:09
  • If it's a greenfields development, why is the file format completely and externally defined? Is this a university assignment? – Peter Wone May 15 '12 at 22:47
  • @Peter Wone Yes this is a university assignment, however the teacher has noted before that he understands very little SQL, so I'd hate to burden him with trying to understand my code, in case my comments don't explain it clearly. I was only after a slight nudge in the direction which would be easier e.g. DataDictionaries along with an INFO class, there definitely is a lot of great information on here. – ShaneL May 16 '12 at 00:35
2

Here is the complete code for this problem. It is written with a simple, straight forward approach. It can be optimized, there's no error checking and the AddData Method can be written in a much more efficient way by using reflection. But it does the job in an elegant way.

using System;
using System.Collections.Generic;
using System.IO;

namespace MutiItemDict
{
    class MultiDict<TKey, TValue>  // no (collection) base class
    {
        private Dictionary<TKey, List<TValue>> _data = new Dictionary<TKey, List<TValue>>();

        public void Add(TKey k, TValue v)
        {
            // can be a optimized a little with TryGetValue, this is for clarity
            if (_data.ContainsKey(k))
                _data[k].Add(v);
            else
                _data.Add(k, new List<TValue>() { v });
        }

        public List<TValue> GetValues(TKey key)
        {
            if (_data.ContainsKey(key))
                return _data[key];
            else
                return new List<TValue>();
        }
    }

    class BookItem
    {
        public BookItem()
        {
            Authors = new List<string>();
            Editors = new List<string>();
        }

        public int? Year { get; set; }
        public string Title { get; set; }
        public string Book { get; set; }
        public List<string> Authors { get; private set; }
        public List<string> Editors { get; private set; }
        public string Publisher { get; set; }
        public string City { get; set; }
        public int? StartPage { get; set; }
        public int? EndPage { get; set; }
        public int? Issue { get; set; }
        public string Conference { get; set; }
        public string Journal { get; set; }
        public int? Volume { get; set; }

        internal void AddPropertyByText(string line)
        {
            string keyword = GetKeyWord(line);
            string data = GetData(line);
            AddData(keyword, data);
        }

        private void AddData(string keyword, string data)
        {
            if (keyword == null)
                return;

            // Map the Keywords to the properties (can be done in a more generic way by reflection)
            switch (keyword)
            {
                case "Year":
                    this.Year = int.Parse(data);
                    break;
                case "Title":
                    this.Title = data;
                    break;
                case "Book":
                    this.Book = data;
                    break;
                case "Author":
                    this.Authors.Add(data);
                    break;
                case "Editor":
                    this.Editors.Add(data);
                    break;
                case "Publisher":
                    this.Publisher = data;
                    break;
                case "City":
                    this.City = data;
                    break;
                case "Journal":
                    this.Journal = data;
                    break;
                case "Volume":
                    this.Volume = int.Parse(data);
                    break;
                case "Pages":
                    this.StartPage = GetStartPage(data);
                    this.EndPage = GetEndPage(data);
                    break;
                case "Issue":
                    this.Issue = int.Parse(data);
                    break;
                case "Conference":
                    this.Conference = data;
                    break;
            }
        }

        private int GetStartPage(string data)
        {
            string[] pages = data.Split('-');
            return int.Parse(pages[0]);
        }

        private int GetEndPage(string data)
        {
            string[] pages = data.Split('-');
            return int.Parse(pages[1]);
        }

        private string GetKeyWord(string line)
        {
            string[] words = line.Split(' ');
            if (words.Length == 0)
                return null;
            else
                return words[0];
        }

        private string GetData(string line)
        {
            string[] words = line.Split(' ');
            if (words.Length < 2)
                return null;
            else
                return line.Substring(words[0].Length+1);
        }
    }

    class Program
    {
        public static BookItem ReadBookItem(StreamReader streamReader)
        {
            string line = streamReader.ReadLine();
            if (line == null)
                return null;

            BookItem book = new BookItem();
            while (line != "End")
            {
                book.AddPropertyByText(line);
                line = streamReader.ReadLine();
            }
            return book;
        }

        public static List<BookItem> ReadBooks(string fileName)
        {
            List<BookItem> books = new List<BookItem>();
            using (StreamReader streamReader = new StreamReader(fileName))
            {
                BookItem book;
                while ((book = ReadBookItem(streamReader)) != null)
                {
                    books.Add(book);
                }
            }
            return books;
        }

        static void Main(string[] args)
        {
            string fileName = "../../Data.txt";
            List<BookItem> bookList = ReadBooks(fileName);

            MultiDict<string, BookItem> booksByAutor = new MultiDict<string, BookItem>();
            bookList.ForEach(bk =>
                    bk.Authors.ForEach(autor => booksByAutor.Add(autor, bk))
                );

            string author = "Bond, james";
            Console.WriteLine("Books by: " + author);
            foreach (BookItem book in booksByAutor.GetValues(author))
            {
                Console.WriteLine("    Title : " + book.Title);
            }

            Console.WriteLine("");
            Console.WriteLine("Click to continue");
            Console.ReadKey();
        }
    }
}

And I also want to mention that all the parsing stuff can be avoided if you represent the Data in XML. The Data then looks like:

<?xml version="1.0" encoding="utf-8"?>
<ArrayOfBookItem >
  <BookItem>
    <Year>1994</Year>
    <Title>For beginners</Title>
    <Book>Accounting</Book>
    <Authors>
      <string>Bond, james</string>
      <string>Smith John A</string>
    </Authors>
    <Editors>
      <string>Smith Joe</string>
      <string>Doe John</string>
    </Editors>
    <Publisher>The University of Chicago Press</Publisher>
    <City>Florida, USA</City>
    <StartPage>15</StartPage>
    <EndPage>23</EndPage>
  </BookItem>
  <BookItem>
    <Year>2000</Year>
    <Title>Medical advances in the modern world</Title>
    <Authors>
      <string>Faux, M</string>
      <string>Sedge, M</string>
      <string>McDreamy, L</string>
      <string>Simbha, D</string>
    </Authors>
    <StartPage>1</StartPage>
    <EndPage>26</EndPage>
    <Issue>2</Issue>
    <Journal>Canadian Journal of medicine</Journal>
    <Volume>25</Volume>
  </BookItem>
  <BookItem>
    <Year>2012</Year>
    <Title>Shape shifting dinosaurs</Title>
    <Authors>
      <string>McFadden, B</string>
      <string>Goodrem, G</string>
    </Authors>
    <City>Vancouver, Canada</City>
    <StartPage>2</StartPage>
    <EndPage>6</EndPage>
    <Conference>Ted Vancouver</Conference>
  </BookItem>
</ArrayOfBookItem>

And the code for reading it:

using (FileStream stream =
    new FileStream(@"../../Data.xml", FileMode.Open,
        FileAccess.Read, FileShare.Read))
        {
            List<BookItem> books1 = (List<BookItem>)serializer.Deserialize(stream);
        }
Tal Segal
  • 2,735
  • 3
  • 21
  • 17
  • This is for a university problem, so i will only reference the code you've given me to get an idea of the method you've used to do the tasks and try to re-write it in a different, but This is very very helpful and at a level i can understand. I will vote this as the top asnwer as it's written at a level i can understand, and rather simple yet does the job. Thanks heaps for your help! – ShaneL May 17 '12 at 08:45
  • 1
    @ShaneL - Thank you, That's what I tried to do and that's how I try to write most of my code - Simple and straight forward, this is best for maintenance. – Tal Segal May 17 '12 at 12:40
  • 1
    And your second example is a beautiful example of why people who invent custom file formats need to be beaten with a big stick and forced to use either XML or JSON. – Peter Wone May 18 '12 at 01:57
  • @TalSegal Hey buddy, just reading through your code and this part has me stumped, any chance you could interpret it to me? `MultiDict booksByAutor = new MultiDict(); bookList.ForEach(bk => bk.Authors.ForEach(autor => booksByAutor.Add(autor, bk)) );` – ShaneL May 18 '12 at 12:03
  • @ShaneL - This syntax is called Linq. I have embeded here two loops. You could replace it with: `foreach (book in booklist) { foreach (author in book.Authors) { booksByAutor.Add(autor, book); } }` You can read about Linq here: http://msdn.microsoft.com/en-us/library/bb397933.aspx – Tal Segal May 18 '12 at 21:14
  • I basically loop on all the books and in each book I loop on all the authors and insert them into the dictionary. – Tal Segal May 18 '12 at 21:21
  • @TalSegal Ah k, yeah I spent a while last night researching about the lambda expression "=>", pretty tricky. Thanks for the Foreach example, makes it much easier to understand. Cheers. Shame there's no Pm'ing on here would've made it soo much easier – ShaneL May 19 '12 at 01:05
  • @ShaneL - you can write to my email if you want to - talseg7@gmail.com – Tal Segal May 19 '12 at 06:34
  • @TalSegal Alright cheers if i have any other questions I'll shoot you an email – ShaneL May 19 '12 at 11:47
  • @ShaneL - If it's something that more people can learn from (and all your other question were of this kind) - Then this is the proper place to ask and get answers and help other people also benefit from it. Anyway, it's always a pleasure for me to help. – Tal Segal May 19 '12 at 18:03
1

Its not quite clear what you need without a better example of the file or how you want to use the data but it sounds like you need to parse the string and put it into an entity. The following is an example using the fields you mentioned above.

public IList<Entry> ParseEntryFile(string fileName)
{
    ...
    var entries = new List<Entry>();

    foreach(var line in file)
    {
        var entry = new Entry();
        ...
        entries.Add(entry);
    }
    return entries;
}


public class Entry
{
    public Book BookEntry { get; set; }
    public Author AuthorEntry { get; set; }
    public Journal JournalEntry { get; set; }
}

public class Book
{
    public string Name{ get; set; }
    ...
}

public class Author
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
}

...
Bronumski
  • 14,009
  • 6
  • 49
  • 77
  • Added sample input file to original post for further clarification. I just need to manipulate the data to form in-text and full references and display them in a listbox and textbox respectively. – ShaneL May 15 '12 at 06:42
1

You can create a class for each item:

class BookItem
        {
            public string Name { get; set; }
            public string Author { get; set; }
        }

Read the data from each line into an instance of this class and store them in a temporary list:

var books = new List<BookItem>();
while (NotEndOfFile())
{
    BookItem book= ReadBookItem(...)
    books.Add(book);
}

After you have this list you can create Multi Value Dictionaries and have quick access to any item by any key. For example to find a book by its author:

var booksByAuthor = new MultiDict<string, BookItem>();

add the items to the Dictionary:

books.ForEach(bk => booksByAuthor.Add(bk.Author, bk));

and then you can iterate on it:

string autorName = "autor1";
Console.WriteLine("Books by: " + autorName);
            foreach (BookItem bk1 in booksByAutor)
            {
                Console.WriteLine("Book: " + bk1.Name);
            }

I got the basic Multi Item Dictionary from here:

Multi Value Dictionary?

This is my implementation:

class MultiDict<TKey, TValue>  // no (collection) base class
        {
            private Dictionary<TKey, List<TValue>> _data = new Dictionary<TKey, List<TValue>>();

            public void Add(TKey k, TValue v)
            {
                // can be a optimized a little with TryGetValue, this is for clarity
                if (_data.ContainsKey(k))
                    _data[k].Add(v);
                else
                    _data.Add(k, new List<TValue>() { v });
            }

            // more members

            public List<TValue> GetValues(TKey key)
            {
                if (_data.ContainsKey(key))
                    return _data[key];
                else
                    return new List<TValue>();
            }

        }
Community
  • 1
  • 1
Tal Segal
  • 2,735
  • 3
  • 21
  • 17
  • Bit hard for me to follow at this stage, However with the class for each item, say if a book had more than 1 author wouldn't i run into problems with the following? `code class BookItem { public string Name { get; set; } public string Author { get; set; } } ` – ShaneL May 15 '12 at 07:00
  • I assumed that you had only one author per book. If you want several authors I would use this code: `class BookItem { public BookItem() { Authors = new List(); } public string Name { get; set; } public List Authors { get; private set; } } BookItem book = new BookItem { Name = "The Great Contraction", Authors = { "Milton Friedman", "Anna Jacobson Schwartz" } };` – Tal Segal May 16 '12 at 18:03
  • This seems like it does everything i need it to in the most simplified way, Thanks heaps. Appreciate the help! – ShaneL May 16 '12 at 23:36