Feature Extraction from Protein Sequence (FEPS)

Feature Extraction from Protein Sequence (FEPS)

FEPS is a comprehensive web-based tool designed for extracting the most widely used sequence-derived features from protein sequences. Pioneering the field of automated feature extraction, FEPS was first released in 2016 and has since evolved into a robust platform. It organizes these features into 7 major groups, encompassing a total of 48 feature extraction methods. Altogether, 2765 unique descriptors can be computed through FEPS (Ismail et al., 2022).

The extracted features can be seamlessly integrated with both traditional machine learning algorithms—such as Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbors (KNN) — and modern deep learning approaches. These computational techniques enable a wide range of bioinformatics classification tasks, including:

  • Protein function prediction
  • Protein classification
  • Protein structure prediction
  • Subcellular localization prediction
  • And many more

👉 Click here to download and read the full FEPS article.

👉 Click here to access the FEPS repository on GitHub.

Input: Protein fasta-formatted sequence file(s)

The input to the webserver is a fasta-formatted protein sequence file. In a typical classification scenario, you may have protein sequences for different groups (download the tutorial). The sequences belonging to the same group are saved together in a single multiple-sequence fasta-formatted file. The input sequences have to meet following guidelines:

  • The sequences must be valid protein sequences
  • The sequences must be in fasta format
  • The sequences of the same group are saved in one file
  • The file name can represent the group name

Protein sequence file list

Feature types
The features are divided into 7 types. Each one contains different feature types. Select a feature type and then select corresponding feature type options from the drop-down menu.
Amino Acid Composition
Composition, transition and distribution
Autocorrelation Descriptors
Pseudo Amino Acid Composition
Quasi and sequence-order-coupling
Shannon entropy descriptors
Other descriptors

Feature type options

Some feature types have options (see the supporting document). You may use the default options or choose options that you want. Moreover, please bear in mind that whenever 'ID Number' is an option, you can select one out of 544 Amino Acid Physicochemical properties from the drop-down menu or enter ID number to specify the amino acid physicochemical properties.

Lambda:


Weight:


Maximum lag:

Select a distance matrix:

Select a property or enter an Amino Acid index ID Physicochemical properties (544):

ID Number:

Amino Acid property:
Enter your AAP

Select amino acid properties






K-space:

Output file format

You can choose one or more file formats. The following are the most common feature file formats accepted by machine learning packages (e.g. weka, svm-light). Whenever, the input file includes the sequences of a protein group, the last column of the output file represents the class labels.

Comma separated value (CSV) file
SVM-light file
Weka format file
Tab delimted text file
Please cite: