Search & Compare

Advanced sequence analysis tools.

How it Works

BWT Search (Burrows-Wheeler Transform)

The BWT is a powerful algorithm used for compressing and indexing genomic data. It allows us to efficiently count and locate short patterns (substrings) within a massive reference sequence without scanning the whole file linearly.

  • Input: A reference sequence (e.g. a genome or gene).
  • Pattern: The subsequence you want to find.
  • Output: Number of occurrences and their 0-indexed positions.

Mash Comparison (MinHash)

Mash uses MinHash sketching to rapidly estimate the Jaccard distance and Similarity between two sequences. Instead of aligning every base, it compares random samples (hashes) of K-mers.

  • Similarity: (0.0 to 1.0) Approximation of how much genetic material is shared.
  • Distance: (0.0 to 1.0) Inverse of similarity, useful for clustering.
  • Efficiency: Extremely fast for comparing large sequences (like bacterial genomes).