Atomium Culture

Atomium Culture

The Permanent Platform of Atomium Culture brings together some of the most authoritative universities, newspapers and businesses in Europe to increase the movement of knowledge: across borders, across sectors and to the public at large.
La plataforma permanente Atomium Culture reúne a las universidades, periódicos y empresas más prestigiosos de Europa para promover el flujo del conocimiento más allá de fronteras, entre sectores y hacia el público en general.

Protein Classification by Artificial Intelligence Techniques

Por: | 07 de enero de 2013


By Alessandra Lumini, University of Bologna

As a consequence of advancements in genome sequencing technology, there is a large amount of biological data available in public databases. This data can prove very useful in the important field of bioinformatics—in particular, the area of research involving protein classification.

Proteins are composed of sequences of amino acids that are linked in linear chains through peptide connections; proteins have several functions within an organism and the different types of proteins are grouped into families according to their biological function. Accurate analysis and classification of proteins are of fundamental importance since they are responsible for some key functions in an organism. Therefore, determining the family or subfamily class for a protein helps researchers access detailed information about a specific target a protein acts on; this information also reveals the protein’s catalytic process and biological function.

Most proteins have similar primary structures, since many of them have a common evolutionary origin; however, protein classification is a difficult problem because unrelated families can also have similar structures.

Nowadays researchers can perform this identification task automatically using several approaches: for example, a computer program for protein classification can compare the unidentified amino acid sequence to the known sequences of proteins and return the classification of our target protein. Although the results are quite encouraging, there is space for further investigation. Of particular interest is the exploration of new methods for extracting features from a protein that enhance classification for a given problem. Extracting features from a protein means to transform the input data to a reduced representation by means of features (descriptors) that capture the most relevant information from the data.

The purpose of our research within the Department of Electronics, Computer Science and Systems at the University of Bologna is the study of artificial intelligence techniques to classify proteins. We start with the 3D tertiary structure of a given protein and combine this information with the primary sequence of the protein (that is, the sequence of the different amino acids).

Finding effective feature extraction methods is still one of most important ongoing issues in protein classification. There are two general views on how extraction should be accomplished. The first is a popular approach, known as the indirect representation of protein spatial structure. This method extracts features from a sequence to perform classification. The second view on how to extract features from proteins is a more direct approach. A program analyses the protein's spatial structure to accomplish the feature extraction. This direct representation can be grouped into three general types: one based on the spatial distribution of atoms, a second on the topological structure of the protein, and a third on the geometrical shape of the protein.

Generally, the indirect representation is lower in computational cost but provides more complex information (i.e., longer feature vectors), whereas the direct representation is higher in computational cost. While the lower computational cost involved in the indirect approach is desirable, the higher dimensional representation produces more information but requires the application of the most advanced techniques in artificial intelligence.

A valuable result of our research is a method for representing a protein, a method that is based on the analysis of the basic structure of the protein, the protein backbone. Protein molecules differ in the number, type and physiochemical properties of amino acid residues, as well as their distribution along the polypeptide chain. These distinctions produce the diversity of protein spatial structures. Instead of considering all atoms, many researchers use the protein backbone to characterize the whole protein structure. The protein backbone consists of repeated sequences of nitrogen (N), carbon (C) and carbon (C) atoms, which are a small subset of all the atoms of the protein, but large enough to reflect the topology and the folding of the protein. An effective representation of the backbone is the distance matrix (that is, the matrix that registers the Euclidean differences among atoms on the backbone); this matrix contains sufficient information about the protein’s structure because the original 3D backbone structure can be reconstructed from it using distance geometry methods.

Our study shows that the idea of representing the distance matrix as an image permits us to obtain a good descriptor that can be used alone to develop a reliable protein classification system; it can also be combined with other information extracted from the primary sequence to improve the performance of the system. Our preliminary experiments show that the proposed combined system outperforms other approaches that are only based on the primary sequence information. We hope that this idea is employed to improve the performance of the current protein classifiers and hence to speed the discovery of useful drugs and vaccines.

Alessandra Lumini
University of Bologna

Hay 0 Comentarios

Los comentarios de esta entrada están cerrados.

About us

Leading young European researchers have been selected by European research universities and the Scientific and Editorial Committees of AC to write an article about their work and the potential impact of this.

El País

EDICIONES EL PAIS, S.L. - Miguel Yuste 40 – 28037 – Madrid [España] | Aviso Legal