2

I need to calculate the number and percentages of polar/non-polar, aliphatic/aromatic/heterocyclic amino acids in this protein sequence that I got from UNIPROT, using BioJava.

I have found in the BioJava tutorial how to read the Fasta files and implemented this code. But I have no ideas how to solve this problem.

If you have some ideas please help me.

Maybe there are some sources where I can check it.

This is the code.

package biojava.biojava_project;

import java.net.URL;

import org.biojava.nbio.core.sequence.ProteinSequence;
import org.biojava.nbio.core.sequence.io.FastaReaderHelper;

public class BioSeq {
    // Inserting the sequence from UNIPROT
    public static ProteinSequence getSequenceForId(String uniProtId) throws Exception {
        URL uniprotFasta = new URL(String.format("https://rest.uniprot.org/uniprotkb/P31574.fasta", uniProtId));
        ProteinSequence seq = FastaReaderHelper.readFastaProteinSequence(uniprotFasta.openStream()).get(uniProtId);
        System.out.printf("id : P31574", uniProtId, seq, System.getProperty("line.separator"), seq.getOriginalHeader());
        System.out.println();
        return seq;
    }
    public static void main(String[] args) {
        try {
            System.out.println(getSequenceForId("P31574"));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
Stanislav Bashkyrtsev
  • 14,470
  • 7
  • 42
  • 45

1 Answers1

0

I don't know if BioJava stores these properties anywhere. But it's easy to just list all the amino acids with their properties manually. Then iterate over the sequence and count those that satisfy the property. Here's an example for the polarity:

import java.io.InputStream;
import java.net.URL;
import java.util.Set;

import org.biojava.nbio.core.sequence.ProteinSequence;
import org.biojava.nbio.core.sequence.compound.AminoAcidCompound;
import org.biojava.nbio.core.sequence.io.FastaReaderHelper;

public class BioSeq {

    public static void main(String[] args) throws Exception {
        ProteinSequence seq = loadFromUniprot("P31574");

        int polarCount = numberOfOccurrences(seq, /*Polar AAs:*/ Set.of("Y", "S", "T", "N", "Q", "C"));
        System.out.println("% of polar AAs: " + ((double)polarCount)/seq.getLength());
    }

    public static ProteinSequence loadFromUniprot(String uniProtId) throws Exception {
        URL uniprotFasta = new URL(String.format("https://rest.uniprot.org/uniprotkb/%s.fasta", uniProtId));
        try (InputStream is = uniprotFasta.openStream()) {
            return FastaReaderHelper.readFastaProteinSequence(is).get(uniProtId);
        }
    }

    private static int numberOfOccurrences(ProteinSequence seq, Set<String> bases) {
        int count = 0;
        for (AminoAcidCompound aminoAcid : seq)
            if(bases.contains(aminoAcid.getBase()))
                count++;
        return count;
    }
}

PS: don't forget to close IO streams after you used them. In the example above I used try-with-resources syntax which automatically closes the InputStream.

Stanislav Bashkyrtsev
  • 14,470
  • 7
  • 42
  • 45
  • Hello! Thank you for your answer! Once I run the code that you show me in the example, got this error: The method of(String, String, String, String, String, String) is undefined for the type Set.... there is some problem with the method numberOfOcurrences and the class Set.of(""). Do you have some recomendation? – arteagavskiy May 09 '22 at 17:16
  • This is the issue with the method: The method numberOfOccurrences(ProteinSequence, Set) from the type BioSeqq is never used locally – arteagavskiy May 09 '22 at 17:29
  • Your first problem - means you have older versions of Java. You can replace `Set.of()` with `new HashSet<>(Arrays.asList("Y", "S", "T", "N", "Q", "C"))` – Stanislav Bashkyrtsev May 09 '22 at 17:31
  • Yes, now it prints the % of Polar AAs. Thank you! – arteagavskiy May 09 '22 at 17:36