0

what is proper way to save all lines from text file to objects. I have .txt file something like this

0001Marcus Aurelius          20021122160   21311
0002William  Shakespeare     19940822332   11092
0003Albert Camus             20010715180   01232

From this file I know position of each data that is written in file, and all data are formatted.

Line number is from 0 to 3
Book author is from 4 to 30
Publish date is from 31 to 37
Page num. is from 38 to 43
Book code is from 44 to 49

I made class Data which holds information about start, end position, value, error.

Then I made class Line that holds list of type Data, and list that holds all error founded from some line. After load data from line to object Data I loop through lineError and add errors from all line to list, because I need to save errors from each line to database.

My question is this proper way to save data from file to object and after processing same data saving to database, advice for some better approach?

public class Data
{
    public int startPosition = 0;
    public int endPosition = 0;
    public object value = null;
    public string fieldName = "";
    public Error error = null;

    public Data(int start, int end, string name)
    {
        this.startPosition = start;
        this.endPosition = end;
        this.fieldName = name;
    }

    public void SetValueFromLine(string line)
    {
        string valueFromLine = line.Substring(this.startPosition, this.endPosition - this.startPosition);
        // if else statment that checks validity of data (lenght, empty value) 
        this.value = valueFromLine;
    }

}

public class Line
{
    public List<Data> lineData = new List<Data>();
    public List<Error> lineError = new List<Error>();

    public Line()
    {
        AddObjectDataToList();
    }

    public void AddObjectDataToList()
    {
        lineData.Add(new Data(0, 3, "lineNumber"));
        lineData.Add(new Data(4, 30, "bookAuthor"));
        lineData.Add(new Data(31, 37, "publishData"));
        lineData.Add(new Data(38, 43, "pageNumber"));
        lineData.Add(new Data(44, 49, "bookCode"));
    }

    public void LoadLineDataToObjects(string line)
    {
        foreach(Data s in lineData)
        {
            s.SetValueFromLine(line);
        }
    }

    public void GetAllErrorFromData()
    {
        foreach (Data s in lineData)
        {
            if(s.error != null)
            {
                lineError.Add(s.error);
            }

        }
    }

}


public class File
{
    public string fileName;
    public List<Line> lines = new List<Line>();
}
TJacken
  • 354
  • 3
  • 12
  • 3
    You may want to research *serialization* - if it has been saved to a DB though why do you need the text form anymore? Please read [ask] and take the [tour] – Ňɏssa Pøngjǣrdenlarp Jan 20 '18 at 18:39
  • So your question is actually how to parse the text file to a database? – Rafael Jan 20 '18 at 18:50
  • No my question is what is best approach to save data from file to objects, because after i have all lines from file saved to objects, I need to make some validation on data and it's easier to loop through all data from first line and check for example do I have author data in my base, book code etc. If some line do not have data from my database I need to skip saving that line in database. I do not have problem with saving data to database, that works fine. I only need advice is this model good for doing that thing saving data from one line to objects and checking if some of data exists. – TJacken Jan 20 '18 at 19:00
  • Are you re-inventing https://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx – Mark Schultheiss Jan 20 '18 at 19:14
  • No, thx for this it seems interesting, my file is not csv. I do not have delimiter sign. As you can see in my example above I have line number and author name connected together. I only know where is position of every data from file and that position is constant. – TJacken Jan 20 '18 at 19:21
  • It also allows fixed width...read on. https://stackoverflow.com/a/11365648/125981 – Mark Schultheiss Jan 20 '18 at 19:23
  • Note if you have a LOT of different files (differing widths) you might consider a custom attribute. It would be different and likely best used if you do differing widths a lot example: https://stackoverflow.com/a/26099038/125981 See this code review also https://codereview.stackexchange.com/questions/27782/reading-fixed-width-data-fields-in-net – Mark Schultheiss Jan 20 '18 at 19:37

1 Answers1

0

I assume that the focus is on using OOP. I also assume that parsing is a secondary task and I will not consider options for its implementation.

First of all, it is necessary to determine the main acting object. Strange as it may seem, this is not a Book, but the string itself (e.g. DataLine). Initially, I wanted to create a Book from a string (through a separate constructor), but that would be a mistake.

What actions should be able to perform DataLine? - In fact, only one - process. I see two acceptable options for this method:

  1. process returns Book or throws exceptions. (Book process())

  2. process returns nothing, but interacts with another object. (void process(IResults result))

The first option has the following drawbacks:

  • It is difficult to test (although this applies to the second option). All validation is hidden inside DataLine.

  • It is impossible/difficult to return a few errors.

  • The program is aimed at working with incorrect data, so expected exceptions are often generated. This violates the ideology of exceptions. Also, there are small fears of slowing performance.

The second option is devoid of the last two drawbacks. IResults can contain methodserror(...), to return several errors, and success(Book book).

The testability of the process method can be significantly improved by adding IValidator. This object can be passed as a parameter to the DataLine constructor, but this is not entirely correct. First, this unnecessary expense of memory because it will not give us tangible benefits. Secondly, this does not correspond to the essence of the DataLine class. DataLine represents only a line that can be processed in one particular way. Thus, a good solution is the void process (IValidator validator, IResults result).

Summarize the above (may contain syntax errors):

interface IResults {
    void error (string message);
    void success (Book book);
}

interface IValidator {
    // just example
    bool checkBookCode (string bookCode);
}

class DataLine {
    private readonly string _rawData;
    // constructor
    /////////////////
    public void process (IValidator validator, IResults result) {
        // parse _rawData
        bool isValid = true; // just example! maybe better to add IResults.hasErrors ()
        if (! validator.checkBookCode (bookCode)) {
            result.error("Bad book code");
            isValid = false;
        }

        if (isValid) {
            result.success(new Book (...));
            // or even result.success (...); to avoid cohesion (coupling?) with the Book
        }
    }
}

The next step is to create a model of the file with the lines. Here again there are many options and nuances, but I would like to pay attention to IEnumerable<DataLine>. Ideally, we need to create a DataLines class that will support IEnumerable<DataLine> and load from a file or from IEnumerable<string>. However, this approach is relatively complex and redundant, it makes sense only in large projects. A much simpler version:

interface DataLinesProvider {
    IEnumerable <DataLine> Lines ();
}

class DataLinesFile implements DataLinesProvider {
    private readonly string _fileName;
    // constructor
    ////////////////////
    IEnumerable <DataLine> Lines () {
        // not sure that it's right
        return File
            . ReadAllLines (_fileName)
            .Select (x => new DataLine (x));
    }
}

You can infinitely improve the code, introduce new and new abstractions, but here you must start from common sense and a specific problem.

P. S. sorry for "strange" English. Google not always correctly translate such complex topics.

Green_Wizard
  • 795
  • 5
  • 11
  • thanks for your advice, yes you are right focus is on using OOP. For validation I have two methods one for validating data before saving data in object (check if length of bookCode is correct, and check if on that position I have any data in some scenario i can have only whitespaces), and another validation when I have all data written in objects check if bookCode has data in database. – TJacken Jan 21 '18 at 22:02
  • `all data data written in objects` - it's not OOP. If you creating `Book` than it's valid book. If it's not a valid book than create `PossibleBook`, that will check is it valid and only than create `Book`. Besides, why you can't check bookCode without creating `Book`? – Green_Wizard Jan 21 '18 at 22:21
  • Because I thought it will slow performance if I at the same time read data from file and check if I have some record in my database. As you said this may cause slowing performance sometimes I can get text file which has more then 10 000 lines in file or even more. – TJacken Jan 21 '18 at 22:30
  • I said that unnecessary throwing of exceptions may cause performance issues. If you want load all data from file and validate it later than my code already done this (`DataLine` just store single line and validate it only on demand). If you working with really big files than you will need to break it into kinda chunks, maybe even add some bulk-validations-requests to DB. – Green_Wizard Jan 21 '18 at 22:45