Open Science · Soil Microbiology · Functional Metagenomics

Identify soil communities that actually work

A computational pipeline that screens tens of thousands of public soil metagenomes to find microbial communities with verified potential for nitrogen fixation, carbon sequestration, and bioremediation — ranked, mapped, and ready to act on.

8+ Public Data Sources
4-Tier Screening Funnel
3 Target Applications
PolyForm-NC Open Source

Four-tier screening funnel

Each tier narrows the candidate pool using increasingly rigorous — and computationally expensive — methods. Only communities that pass all filters reach the ranked output.

Tier 0
Data Intake & Quality Filter
Pulls sequencing data from SRA, MGnify, EMP, QIITA, and NEON. Filters on sequencing depth, soil pH range, functional gene presence (nifH, dsrAB, mcrA), and metadata completeness.
Fast · Millions of samples
Tier 0.25
ML Functional Prediction
Random forest and gradient boosting models predict target function scores. Community similarity search finds nearest high-performing reference communities.
Minutes per community
Tier 1
Metabolic Modeling
Community flux balance analysis (FBA) via genome-scale models. Predicts actual metabolic flux through target pathways. Identifies keystone taxa.
Hours per community
Tier 2
Dynamics & Intervention
Dynamic FBA simulates community stability over time. Screens bioinoculant and amendment interventions. Predicts establishment probability and off-target effects.
Top candidates only

Three proven configurations

Each application is a pre-built configuration targeting a specific ecosystem service. The same pipeline core, different scientific objectives.

🌱
Nitrogen Cycling
Biological Nitrogen Fixation
Identifies communities with high BNF potential for dryland wheat and other commodity crops. Targets reduction of synthetic nitrogen fertilizer dependency — a direct input cost and emissions driver.
  • Target flux≥ 0.5 mmol N/g soil/day
  • Key functional genenifH (HGT-validated)
  • Soil contextpH 5.5–7.5, semi-arid
  • Relevant cropWheat, sorghum, corn
🌍
Climate Mitigation
Soil Carbon Sequestration
Screens for communities driving net soil organic carbon accumulation in grassland and regenerative agriculture systems. Includes fungal community analysis, lignin degradation, and suppression of GHG-producing pathways.
  • Target flux≥ 0.1 g C/kg soil/year
  • Key functional genelaccase, mnp, lip
  • Soil contextTemperate grassland, regen-ag
  • Co-benefitSuppresses methanogenesis
🏭
Contaminated Site Restoration
Hydrocarbon Bioremediation
Finds communities capable of in-situ degradation of petroleum hydrocarbons at contaminated industrial sites. Inverted context: sources communities from contaminated soils, not pristine environments.
  • Target flux≥ 0.01 mmol/g soil/day
  • Key functional genesalkB, xylE, catA
  • Soil contextpH 5.5–8.5, contaminated
  • Heavy metal toleranceCo-screened

Who this is built for

The pipeline outputs — ranked community registries, intervention recommendations, and regional findings maps — are designed to be immediately actionable for these audiences.

🏛️
USDA / NRCS / State Ag Labs
Complement soil chemistry reports with microbial functional profiles. Support fertilizer reduction mandates with quantitative BNF community data.
🌿
Regenerative Agriculture NGOs
Quantitative microbial baselines for field trials. Identify which communities to inoculate and what amendments to pair for maximum establishment probability.
🔬
Soil Testing Laboratories
Add microbial functional prediction to existing chemistry reports. Differentiate from commodity lab services with community-level insights.
EPA / Environmental Consultants
Bioremediation pipeline output directly applicable to Superfund and brownfield site assessments. Rank candidate communities before expensive field trials.
🌐
International Research Programs
CGIAR, FAO Global Soil Partnership, and CGIAR centers focused on dryland wheat, smallholder fertility, and degraded land restoration.
📊
Climate Funders & Foundations
Quantitative soil community data to underpin carbon credit methodologies and regenerative agriculture investment theses.

Built on public metagenomics data

The pipeline integrates eight major public repositories. No proprietary data required to run a global screening analysis.

NCBI SRA
Primary source for shotgun metagenomes and 16S amplicon datasets. Millions of soil samples globally.
MGnify
EBI's metagenomics platform. Pre-analyzed functional profiles reduce compute requirements significantly.
Earth Microbiome Project
Standardized 16S data from 30,000+ samples across biomes. Comprehensive soil_non-saline subset.
Qiita
QIIME-processed amplicon data with rich metadata. Direct API integration for metadata-filtered queries.
NEON
National Ecological Observatory Network. Long-term soil monitoring across US climate zones.
redbiom
Cross-study metadata search. Enables targeted queries like "dryland wheat soils, pH 6.0–7.0, US Great Plains."
AGP
American Gut Project soil samples. Broad geographic coverage with standardized processing.
Local BIOM
Direct import of locally-generated OTU/ASV tables. Supports partner lab data ingestion alongside public sources.

Request a regional analysis

If you work in soil science, extension, ag research, or environmental consulting and want to see what the pipeline finds for your region or application — get in touch. Early collaborations are welcome.

We can work from publicly available data for any region, or integrate soil chemistry data you already have from local testing programs.

📍 Based in: Middle Tennessee
⚖️ License: PolyForm Noncommercial 1.0.0 — free for research & academia, commercial use requires agreement

No spam — just science. We'll reply within a few days.

✅ Thanks — we'll be in touch within a few days.