deepFEPS: Deep Learning-Oriented Feature Extraction for Biological Sequences

DeepFEPS is a high-performance bioinformatics platform for extracting advanced sequence-based features from DNA, RNA, and protein data. It integrates modern machine learning and deep learning techniques to transform raw biological sequences into rich numerical representations suitable for classification, clustering, and predictive modeling.

Each feature extractor below offers an advanced way of representing biological sequences — from sequence embedding models such as Word2Vec, FastText, and Doc2Vec, to Transformer-based architectures, Autoencoder-derived features, and Graph-based embeddings. These deep learning and graph representation techniques can capture complex sequence patterns and relationships beyond simple k-mer counts, enabling more powerful analysis for functional annotation, motif discovery, and predictive modeling.

Simply select the method that best fits your research goals, upload your sequences, configure the parameters, and download your processed features.

Autoencoder features

Learned compressed representations using autoencoders on k-mer BoW or fixed one-hot encodings.

Open
Doc2Vec embeddings

Sequence-as-document embeddings (PV-DM / PV-DBOW) over k-mers; strong global context vectors.

Open
Graph embeddings

k-mer graph embeddings (DeepWalk/Node2Vec/Graph2Vec) pooled to fixed-size features.

Open
Transformer embeddings

ProtBERT / ESM2 / DNABERT with configurable pooling and device.

Open
Word2Vec / FastText

Word2Vec or FastText on k-mers with flexible pooling strategies.

Open