Active Research

Active Research Initiatives

Core research directions in flight, ranked by current progress.

10M+

Data Points Generated

30+

Research Partners

200+

Published Papers

AlphaCell Integration

Generating interactome data exponentially to power next-generation AI models for cellular prediction.

35%

Synthetic Biology Platforms

Novel SynBio technologies for rapid prototyping and testing of engineered biological systems.

90%

Planet Engineering Research

Developing algae-based solutions for atmospheric processing and terraforming applications.

15%

Core Technology

A Trinity Technology Platform

Merging artificial intelligence, digital logic, and life science to build the next-generation synthetic biology foundation

AI4Cell Molecular Interaction

Build an AI-driven molecular interaction library, accumulating cell behavior data at exponential speed. Use large models to predict protein folding and signaling pathways, accelerating biological component design iteration.

Molecular Interaction Library·Exponential Data Growth

Digital Logic Biotech

Inspired by integrated circuit design, abstract cell signaling pathways as logic gate circuits. Modularly engineer cell factories for predictable, reproducible biosynthesis workflows.

Logic Gate Circuits·Engineered Cell Factories

AlphaCell Development

Proprietary cancer cell phenotype analysis model, improving data analysis accuracy from 51% to 90%. Provides high-confidence single-cell characterization for precision medicine and drug development.

Accuracy 51% → 90%·Single-Cell Analysis

Data Moat

Large-Scale Ground-Truth Data Foundation

We generate proprietary, inherently labeled ground-truth experimental data via NxN full-matrix wet-lab assays — the research data foundation no in-silico model can simulate — to train AlphaCell and improve AlphaFold protein structure prediction.

Data Generation

10M+ assays

Wet-lab NxN full-matrix assays produce ground-truth interactome data at unprecedented throughput — the data layer no in-silico model can simulate.

12.5× Cheaper

Data Training

Self-Labeled

Ground truth data is inherently labeled — no annotation needed. Starting from a small AlphaCell model keeps compute costs minimal

Zero Annotation Cost

Data Storage

NxN Compact

Structured NxN matrix format is more compact than unstructured data, reducing storage and retrieval overhead significantly

Efficient Matrix Format

Ground Truth Data Trains Both Models

Every experiment produces inherently annotated ground truth data that feeds two AI systems simultaneously

Ground Truth DataNxN full-matrix assays

AlphaCell

Cancer cell phenotype (proprietary)

AlphaFold

Protein structure prediction (improved)

Better Predictions

More Experiments

More Data

AlphaCell Algorithm: 3-Step Pipeline for Cell Regulatory Networks

Step 1: Spatiotemporal transcriptomics input → Step 2: DMD (FFT + Phase Analysis) → Eigen-clusters, decomposing each gene's expression into ~100+ transcription factor contributions → Step 3: Diffusion Model + GRN to infer the full gene regulatory network. DMD originates from fluid dynamics; SATORI reimplemented the legacy MATLAB package in Python/PyTorch.

Fluid Dynamics FNO + FFT Signal ProcessingDiffusion Model + GRN Network Inference

Meta-Learning Dual-Layer Architecture

Data Moat Core

Layer 1 · Inner Model (Analyze Biological Data)

FNO+Evo2+DiffusionGRN

FNO + Evo2 + Diffusion Model synergistically process spatiotemporal transcriptomics and genomic sequence data to infer complete Gene Regulatory Networks (GRN)

Layer 2 · Meta Model (Learn the Learning Process)

Network DepthAuto-optimized

FFT Truncation FreqAdaptive

Diffusion StepsDynamic Search

Multimodal Weights (FNO/Evo2)Auto-balanced

Observes the Inner Model's training process, automatically discovers optimal hyperparameter combinations, continuously improving model performance

Dual flywheel: More data → Better Inner Model → Smarter Meta Model → Better predictions → Better experiments → More data

AutoResearch Autonomous Science Engine

Layer 3 · Simulating Scientific Thinking

Inspired by Karpathy's AutoResearch and Sakana AI Scientist, AlphaCell goes beyond data analysis — it autonomously proposes hypotheses, designs experiments, evaluates results, and iterates discoveries. Each round takes 5 minutes, running 100+ experiments overnight.

Step 1

Propose Hypothesis

Identify key regulatory relationships from GRN, auto-generate research hypotheses

Step 2

Design Experiment

Agentic Tree Search explores multiple experimental designs, selects optimal

Step 3

Run Experiment

5-min fast training/validation, auto-modify code and execute

Step 4

Evaluate Results

Auto-evaluate val_bpb/accuracy, keep good results, discard bad

Step 5

Generate Knowledge

Output interpretable programs and discoveries, feed back to Inner Model

AlphaCell Three-Layer Architecture Overview

1Inner Model: FNO + Evo2 + Diffusion → Analyze biological data → GRN

2Meta Model: Observe Layer 1 learning → Auto-optimize hyperparameters

3AutoResearch: Simulate scientist thinking → Hypothesize → Design experiments → Validate → Discover new knowledge

System Workflow

CELLOS End-to-End Workflow

Based on the Design-Build-Test-Learn (DBTL) closed loop, integrating AlphaCell end-to-end model and ms-swift fine-tuning framework, achieving full-chain automation from AI design to mass production delivery

📊

Data Collection●

🧠

AI Analysis & Design

🧬

Genetic Engineering & Build

🔬

Fermentation

✅

QC CoA

📦

Delivery & Feedback

DBTL Loop

10K

Interaction Data

AlphaCell v1

mg/L

72%

Accuracy

Baseline Yield Prediction

Each DBTL round: larger ring, more particles, bigger nodes = Data & Model Scaling

More Data → Bigger Models → Solve Bigger Problems

🤖

AlphaCell Algorithm Architecture

Data-DrivenFNO (Fourier Neural Operator)

Sequence-DrivenEvo2 (Genomic LLM)

Network InferenceDiffusion Model → GRN

Zero-ShotEvo2 TF-DNA Binding Prediction

Downstream TrainingAlphaCell + AlphaFold

🧪

AI Training Data

Molecular Interaction DataNxN Full Matrix

Protein Structure DataPDB + Proprietary Data

Directed Evolution Data520+ Rounds of Iteration

Fermentation Process DataReal-time Sensors

7-Year Data Goal175 ZB

📈

Key Performance Metrics

Collagen Yield50 → 500 mg/L

Cancer Cell Analysis Accuracy51% → 90%

Triple Helix Integrity94.7%

Thermal Stability Tm38.2°C → 42.5°C

Production Cost Reduction↓ 99.25%

Four-Step Mission

The Complete SATORI Mission

From Decoding Life to Evolving Life — building the synthetic biology closed loop

Decode Life

解码生命

5 specialized agents analyze different omics data
Each agent independently outputs P(edge) probabilities
Joint probability guides Diffusion Model → GRN
Logic decoupling avoids multimodal overfitting

Multi-Agent Omics Architecture

Design Life

设计生命

Convex Analysis for metabolic network programming
S · v = 0 (stoichiometric matrix × flux vector)
Log-phase cells → static mathematical solution space
Original MATLAB package → SATORI Python rewrite

Convex Optimization Metabolic Programming

Build Life

改造生命

NexT nanotechnology-mediated gene editing
CRISPR precision editing of identified targets
Signal peptide AI design for secretion optimization
Multi-round iterative chassis cell engineering

CRISPR + NexT Nanotechnology

Evolve Life

进化生命

AutoResearch autonomous iteration
QR code tracking for high-throughput screening
Directed evolution every 2 weeks
All data feeds back to Decode Life → continuous loop

AutoResearch Autonomous Iteration

Multi-Agent Omics Architecture

5 Specialized Agents · Independent Inference · Joint Decision

Each agent focuses on a single omics type, independently producing edge probabilities — avoiding single-model multimodal overfitting

Epigenomics

→ P(edge)

Transcriptomics (FNO)

→ P(edge)

Genomics (Evo2)

→ P(edge)

Metabolomics

→ P(edge)

Proteomics

→ P(edge)

Joint Probability → Diffusion Model → GRN

Logic decoupling vs single-model multimodal overfitting

Dynamic Phase vs Steady State

Dynamic phases → FNO (spatiotemporal dynamics)

Steady state (log phase) → Convex Analysis (metabolic programming)

Convex Analysis

S · v = 0

Stoichiometric matrix × flux vector = 0 · steady-state optimal flux distribution

Cell Atlas Contact