0

First of all, Hi to everyone. I'm a beginner with C# and trying to do this homework. My problem is, reading a specific part of a .pdb (protein data bank) file and split that specific lines into an array or list. Then I will use it for a Forms App

So .pdb file index is looks like this;

HEADER    ANTIFREEZE 17-SEP-97   7MSI              
TITLE     TYPE III ANTIFREEZE PROTEIN ISOFORM HPLC 12                           
COMPND    MOL_ID: 1;                                                            
COMPND   2 MOLECULE: TYPE III ANTIFREEZE PROTEIN ISOFORM HPLC 12;       
SOURCE    MOL_ID: 1;                                                            
SOURCE   2 ORGANISM_SCIENTIFIC: MACROZOARCES AMERICANUS;

ATOM      1  N   MET A   0      18.112  24.345  32.146  1.00 51.10           N  
ATOM      2  CA  MET A   0      18.302  23.436  31.020  1.00 49.06           C  
ATOM      3  C   MET A   0      18.079  24.312  29.799  1.00 46.75           C  
ATOM      4  O   MET A   0      16.928  24.678  29.560  1.00 48.24           O  
ATOM      5  CB  MET A   0      17.257  22.311  31.008  1.00 48.14           C  
ATOM      6  N   ALA A   1      19.106  24.757  29.076  1.00 43.47           N

HETATM  491  O   HOH A 101      23.505  19.335  23.451  1.00 35.56           O  
HETATM  492  O   HOH A 102      19.193  19.013  25.418  1.00 12.73           O  
HETATM  493  O   HOH A 103       7.781  12.538  12.927  1.00 80.11           O

.... and goes on like this

I only need to read the lines that starts with "ATOM" keyword. Then I want to split their informations to variables and to an array or list. After that I want to print the maximum value of X Coordinate to a label.

For example;

ATOM     1  N   MET A   0      18.112  24.345  32.146  1.00 51.10           N

1 stands for atom number

N stands for atom name

MET stands for amino acid name

18.112 stands for X coordinate etc.

WHAT I DID

I used the codes from a similar question that was asked here before but i couldn't implement it to my project. First I created a Class for variables

class Atom
{
    public int atom_no;
    public string atom_name;
    public string amino_name;
    public char chain;
    public int amino_no;
    public float x_coordinate;
    public float y_coordinate;
    public float z_coordinate;
    public float ratio;
    public float temperature;
}

For the main class; NOTE: I should mention that there's not single whitespace beetween variables. For example between "MET" and "A" there are extra 3 or 4 whitespaces. I've tried to remove them while reading file but I don't know if that worked..

     private void button1_Click(object sender, EventArgs e)
        {
            string filePath = @"path_of_file";
            string stringToSearch = @"ATOM";


      List<Atom> Atoms = new List<Atom>();
      using (StreamReader sr = new StreamReader(filePath))
          {
          string[] lines = File.ReadAllLines(filePath);

             foreach (string line in lines)
             {
             if (line.Contains(stringToSearch))   // i have tried to read the parts that starts with ATOM
             {
               while (sr.Peek() >= 0)   //this while part is from the question asked before
               {
                   string[] strArray;
                   string line1 = sr.ReadLine();               // i've added theese 2 lines to remove the extra whitespaces 
                   var lineParts = line1.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

                    strArray = line1.Split(' ');
                    Atom currentAtom = new Atom();
                    currentAtom.atom_no = int.Parse(strArray[0]);
                    currentAtom.atom_name = strArray[1];
                    currentAtom.amino_name = strArray[2];
                    currentAtom.chain = char.Parse(strArray[3]);
                    currentAtom.amino_no = int.Parse(strArray[4]);
                    currentAtom.x_coordinate = float.Parse(strArray[5]);
                    currentAtom.y_coordinate = float.Parse(strArray[6]);
                    currentAtom.z_coordinate = float.Parse(strArray[7]);
                    currentAtom.ratio = float.Parse(strArray[8]);
                    currentAtom.temperature = float.Parse(strArray[9]);

                    Atoms.Add(currentAtom);

                }

             }
         }


      }
      listBox1.DataSource = Atoms;
      listBox1.ValueMember = "atom_no";
      listBox1.DisplayMember = "atom_name";

}

I didn't add the part that i want to print the max value of X Coordinate to a label yet. I'm testing at this point with listbox. So when I run the code and press the button gives me "Input string was not in a correct format" error at the currentAtom.atom_no = int.Parse(strArray[0]); line.

I know that my code looks like mess and sorry If I've stolen your time with this. I would be much appreciated if you guys can help me do this Forms app for my homework. If not, still thank you for reading it. Have a nice and healhty day..

Sir Rufo
  • 18,395
  • 2
  • 39
  • 73
  • 1
    Why are you using a stream reader AND `File.ReadAllLines`? `File.ReadAllLines` returns you all the lines from the file in an array. You can then filter them and split them after that. – Rufus L May 19 '20 at 22:10
  • `File.ReadAllLines(filePath).Where(line => line.Contains(stringToSearch))` Will return all the lines that contain your search term. Then you can use `Split` to split on whitespace and do what you want with the different parts of each line. – Rufus L May 19 '20 at 22:13
  • I've searched for, how i should get the parts that only starts with ATOM keyword found something that doesn't use stream reader and tried to mix the 2 codes. I don't know how to get the ATOM parts using StreamReader. Though I don't know that worked.. – Ronnie O'Sullivan May 19 '20 at 22:14
  • You can do it either way, but it doesn't really make sense to use *both at the same time*. `File.ReadAllLines` is simpler. – Rufus L May 19 '20 at 22:16
  • But there's multiple whitespaces between variables. Will Split method ignore the extra whitespaces? – Ronnie O'Sullivan May 19 '20 at 22:16
  • Read [the documentation](https://learn.microsoft.com/en-us/dotnet/api/system.string.split?view=netcore-3.1), try it and see. – Rufus L May 19 '20 at 22:16

1 Answers1

0

One way to do this is to just use File.ReadAllLines to read the file, then filter out any lines that don't StartWith the stringToSearch text (using the System.Linq method Where), and finally select a new Atom from each line using the Split method (and remove empty entries) as you were doing, and finally returning them all with ToList:

List<Atom> Atoms = File.ReadAllLines(filePath)       // Read all the lines
    .Where(line => line.StartsWith(stringToSearch))  // Filter on our string
    .Select(line =>
    {
        // Split the line on the space character into an array 
        var strArray = line.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries);

        // Return a new Atom for each line based on the array contents
        return strArray.Length < 10  // Ensure we have at least ten elements in the array
            ? null                   // If we don't have 10, return 'null'
            : new Atom               // Otherwise return a new atom from the array
            {
                atom_no = int.Parse(strArray[0]),
                atom_name = strArray[1],
                amino_name = strArray[2],
                chain = char.Parse(strArray[3]),
                amino_no = int.Parse(strArray[4]),
                x_coordinate = float.Parse(strArray[5]),
                y_coordinate = float.Parse(strArray[6]),
                z_coordinate = float.Parse(strArray[7]),
                ratio = float.Parse(strArray[8]),
                temperature = float.Parse(strArray[9])
            };
    })
    .ToList();                       // Return the items as a list
Rufus L
  • 36,127
  • 5
  • 30
  • 43
  • Thank you so much your for help and interest, I didn't give me any errors but I couldn't get the max value from x_coordinate. I'm not familiar with Linq, so can you please show me how can I get the max value of x_coordinate and print it out. – Ronnie O'Sullivan May 19 '20 at 23:59
  • `int maxX = Atoms.Max(atom => atom.x_coordinate);` – Rufus L May 20 '20 at 00:45
  • Thank you sooo much for your help. Have a nice day Mr. Rufus.. – Ronnie O'Sullivan May 20 '20 at 01:28
  • I have a one more question, if you have time. So we have the maxX variable which has the Max value of X Coordinate, is there a way to get informations about the line of maxX value comes from? I mean which line has the maxX? Or what's the atom_name of that line? – Ronnie O'Sullivan May 20 '20 at 14:38
  • Just search this site: https://stackoverflow.com/q/3188693/2052655 – Rufus L May 20 '20 at 15:02