Active Research

Active Research Initiatives

Core research directions in flight, ranked by current progress.

10M+
Data Points Generated
30+
Research Partners
200+
Published Papers

AlphaCell Integration

Generating interactome data exponentially to power next-generation AI models for cellular prediction.

35%

Synthetic Biology Platforms

Novel SynBio technologies for rapid prototyping and testing of engineered biological systems.

90%

Planet Engineering Research

Developing algae-based solutions for atmospheric processing and terraforming applications.

15%
Core Technology

A Trinity Technology Platform

Merging artificial intelligence, digital logic, and life science to build the next-generation synthetic biology foundation

AI4Cell Molecular Interaction

Build an AI-driven molecular interaction library, accumulating cell behavior data at exponential speed. Use large models to predict protein folding and signaling pathways, accelerating biological component design iteration.

Molecular Interaction Library·Exponential Data Growth

Digital Logic Biotech

Inspired by integrated circuit design, abstract cell signaling pathways as logic gate circuits. Modularly engineer cell factories for predictable, reproducible biosynthesis workflows.

Logic Gate Circuits·Engineered Cell Factories

AlphaCell Development

Proprietary cancer cell phenotype analysis model, improving data analysis accuracy from 51% to 90%. Provides high-confidence single-cell characterization for precision medicine and drug development.

Accuracy 51% → 90%·Single-Cell Analysis
Data Moat

Large-Scale Ground-Truth Data Foundation

We generate proprietary, inherently labeled ground-truth experimental data via NxN full-matrix wet-lab assays — the research data foundation no in-silico model can simulate — to train AlphaCell and improve AlphaFold protein structure prediction.

Data Generation

10M+ assays

Wet-lab NxN full-matrix assays produce ground-truth interactome data at unprecedented throughput — the data layer no in-silico model can simulate.

12.5× Cheaper

Data Training

Self-Labeled

Ground truth data is inherently labeled — no annotation needed. Starting from a small AlphaCell model keeps compute costs minimal

Zero Annotation Cost

Data Storage

NxN Compact

Structured NxN matrix format is more compact than unstructured data, reducing storage and retrieval overhead significantly

Efficient Matrix Format

Ground Truth Data Trains Both Models

Every experiment produces inherently annotated ground truth data that feeds two AI systems simultaneously

Ground Truth DataNxN full-matrix assays

AlphaCell

Cancer cell phenotype (proprietary)

+

AlphaFold

Protein structure prediction (improved)

Better Predictions
More Experiments
More Data

AlphaCell Algorithm: 3-Step Pipeline for Cell Regulatory Networks

Step 1: Spatiotemporal transcriptomics input → Step 2: DMD (FFT + Phase Analysis) → Eigen-clusters, decomposing each gene's expression into ~100+ transcription factor contributions → Step 3: Diffusion Model + GRN to infer the full gene regulatory network. DMD originates from fluid dynamics; SATORI reimplemented the legacy MATLAB package in Python/PyTorch.

Fluid Dynamics FNO + FFT Signal ProcessingDiffusion Model + GRN Network Inference

Meta-Learning Dual-Layer Architecture

Data Moat Core
Layer 1 · Inner Model (Analyze Biological Data)
FNO+Evo2+DiffusionGRN

FNO + Evo2 + Diffusion Model synergistically process spatiotemporal transcriptomics and genomic sequence data to infer complete Gene Regulatory Networks (GRN)

Layer 2 · Meta Model (Learn the Learning Process)
Network DepthAuto-optimized
FFT Truncation FreqAdaptive
Diffusion StepsDynamic Search
Multimodal Weights (FNO/Evo2)Auto-balanced

Observes the Inner Model's training process, automatically discovers optimal hyperparameter combinations, continuously improving model performance

Dual flywheel: More data → Better Inner Model → Smarter Meta Model → Better predictions → Better experiments → More data

AutoResearch Autonomous Science Engine

Layer 3 · Simulating Scientific Thinking

Inspired by Karpathy's AutoResearch and Sakana AI Scientist, AlphaCell goes beyond data analysis — it autonomously proposes hypotheses, designs experiments, evaluates results, and iterates discoveries. Each round takes 5 minutes, running 100+ experiments overnight.

Step 1
Propose Hypothesis
Identify key regulatory relationships from GRN, auto-generate research hypotheses
Step 2
Design Experiment
Agentic Tree Search explores multiple experimental designs, selects optimal
Step 3
Run Experiment
5-min fast training/validation, auto-modify code and execute
Step 4
Evaluate Results
Auto-evaluate val_bpb/accuracy, keep good results, discard bad
Step 5
Generate Knowledge
Output interpretable programs and discoveries, feed back to Inner Model
AlphaCell Three-Layer Architecture Overview
1Inner Model: FNO + Evo2 + Diffusion → Analyze biological data → GRN
2Meta Model: Observe Layer 1 learning → Auto-optimize hyperparameters
3AutoResearch: Simulate scientist thinking → Hypothesize → Design experiments → Validate → Discover new knowledge
System Workflow

CELLOS End-to-End Workflow

Based on the Design-Build-Test-Learn (DBTL) closed loop, integrating AlphaCell end-to-end model and ms-swift fine-tuning framework, achieving full-chain automation from AI design to mass production delivery

📊
Data Collection
🧠
AI Analysis & Design
🧬
Genetic Engineering & Build
🔬
Fermentation
QC CoA
📦
Delivery & Feedback
DBTL Loop
10K
Interaction Data
AlphaCell v1
50
mg/L
72%
Accuracy
Baseline Yield Prediction
Each DBTL round: larger ring, more particles, bigger nodes = Data & Model Scaling
More Data → Bigger Models → Solve Bigger Problems
🤖

AlphaCell Algorithm Architecture

Data-DrivenFNO (Fourier Neural Operator)
Sequence-DrivenEvo2 (Genomic LLM)
Network InferenceDiffusion Model → GRN
Zero-ShotEvo2 TF-DNA Binding Prediction
Downstream TrainingAlphaCell + AlphaFold
🧪

AI Training Data

Molecular Interaction DataNxN Full Matrix
Protein Structure DataPDB + Proprietary Data
Directed Evolution Data520+ Rounds of Iteration
Fermentation Process DataReal-time Sensors
7-Year Data Goal175 ZB
📈

Key Performance Metrics

Collagen Yield50 → 500 mg/L
Cancer Cell Analysis Accuracy51% → 90%
Triple Helix Integrity94.7%
Thermal Stability Tm38.2°C → 42.5°C
Production Cost Reduction↓ 99.25%
Four-Step Mission

The Complete SATORI Mission

From Decoding Life to Evolving Life — building the synthetic biology closed loop

01

Decode Life

解码生命

  • 5 specialized agents analyze different omics data
  • Each agent independently outputs P(edge) probabilities
  • Joint probability guides Diffusion Model → GRN
  • Logic decoupling avoids multimodal overfitting
Multi-Agent Omics Architecture
02

Design Life

设计生命

  • Convex Analysis for metabolic network programming
  • S · v = 0 (stoichiometric matrix × flux vector)
  • Log-phase cells → static mathematical solution space
  • Original MATLAB package → SATORI Python rewrite
Convex Optimization Metabolic Programming
03

Build Life

改造生命

  • NexT nanotechnology-mediated gene editing
  • CRISPR precision editing of identified targets
  • Signal peptide AI design for secretion optimization
  • Multi-round iterative chassis cell engineering
CRISPR + NexT Nanotechnology
04

Evolve Life

进化生命

  • AutoResearch autonomous iteration
  • QR code tracking for high-throughput screening
  • Directed evolution every 2 weeks
  • All data feeds back to Decode Life → continuous loop
AutoResearch Autonomous Iteration
Multi-Agent Omics Architecture

5 Specialized Agents · Independent Inference · Joint Decision

Each agent focuses on a single omics type, independently producing edge probabilities — avoiding single-model multimodal overfitting

Epigenomics

→ P(edge)

Transcriptomics (FNO)

→ P(edge)

Genomics (Evo2)

→ P(edge)

Metabolomics

→ P(edge)

Proteomics

→ P(edge)
Joint Probability → Diffusion Model → GRN

Logic decoupling vs single-model multimodal overfitting

Dynamic Phase vs Steady State

Dynamic phases → FNO (spatiotemporal dynamics)

Steady state (log phase) → Convex Analysis (metabolic programming)

Convex Analysis

S · v = 0

Stoichiometric matrix × flux vector = 0 · steady-state optimal flux distribution