0

I need a help in developing a Windows Appl using C#.NET VS2010. The functionality is very simple, the user will input a text file and my program is supposed to extract the relevant data from the text file and output it to either csv or text or whatever.

My biggest problem whenever I deal with text files is the format. Even though if you open the input text file in a Notepad or Wordpad it looks perfect, the layout etc. But once we start programming it I realize that what I am seeing is not the way the data is stored inside the file. I read many articles on Unicode/UTF etc.. etc.. but I dont have a definite solution to know exactly what my file format is. So the end result is that I end up getting many exceptions.

In Unix Shell Scripting it used to be simple. There is some good Unix command like less which is similar to more but it also display any formatting characters inside the file. Also there are some useful commands like unix2dos and dos2unix.

Nevertheless, is there some program/code or professional method which can find the exact file formatting of my input file and then reformat it to "plain text" so that the data extraction becomes easy and bug-free.

Thanks

  • this: http://stackoverflow.com/questions/4520184/how-to-detect-the-character-encoding-of-a-text-file and this: http://stackoverflow.com/questions/3825390/effective-way-to-find-any-files-encoding should help – jparram Mar 11 '14 at 17:46
  • ok what should be the next step. Lets say I use any of the suggested solutions and it returns the encoding type, how should I handle the return value? Where in my code I should specify the returned encoding type and what should I do with it? – half-baked prgrmr Mar 11 '14 at 19:09
  • Have you tried reading the file in using BinaryReader and then reading in some number of bytes using ReadBytes? Doing this, you can look at the exact contents of your buffer to see what's in there that's tripping you up. – FodderZone Mar 11 '14 at 21:36

0 Answers0