Skip to main content

Crate metadata_gen

Crate metadata_gen 

Source
Expand description

Metadata Gen logo

metadata-gen

A typed, audited frontmatter parser for Rust — YAML, TOML, and JSON extraction with HTML meta-tag emission for SEO, Open Graph, Twitter Cards, and Apple Web Apps.

Build Crates.io Docs.rs Coverage lib.rs


§Table of contents


§What it does

metadata-gen parses a content file’s frontmatter — the structured block at the top of a Markdown/HTML document — and turns it into a usable Rust value plus a set of SEO meta tag groups (primary, og, twitter, apple, ms).

It accepts three frontmatter formats out of the box:

FormatDelimitersExample header
YAML--- / ---title: Hello
TOML+++ / +++title = "Hello"
JSON{ / }{"title": "Hello"}

Format detection is automatic. The detection order is YAML → TOML → JSON; the first format whose opening delimiter matches is used. Nested values are flattened with dot-separated keys (e.g. author.name), and sequences are serialized as [a, b, c] strings in the current map-based API.

§When to use it

Choose metadata-gen when you want:

  • A single dependency that handles YAML, TOML, and JSON frontmatter without forcing you to pick a serializer first.
  • A built-in HTML meta tag generator (Open Graph, Twitter Cards, Apple Mobile, Microsoft Tiles, primary SEO) wired to the same metadata map.
  • A library with an explicit supply-chain posturecargo-deny, cargo-audit, SBOM emission, and #![forbid(unsafe_code)] enforced.
  • A library actively heading toward typed extraction, zero-copy values, and a WASI 0.2 component (see the Roadmap).

Choose something else when you only need raw YAML parsing (use serde_yaml_ng or noyalib directly) or when you need typed extraction today (track issue #42 for v0.0.6).

§Install

cargo add metadata-gen

Or add to Cargo.toml:

[dependencies]
metadata-gen = "0.0.5"

Minimum Supported Rust Version: 1.88.0 — see the MSRV policy. Tested on Linux, macOS, and Windows on x86_64 and ARM64.

§Quick start

use metadata_gen::extract_and_prepare_metadata;

let content = "---\n\
title: Hello, world!\n\
description: A short greeting\n\
keywords: rust, frontmatter, seo\n\
---\n\

let (metadata, keywords, tags) =
    extract_and_prepare_metadata(content).expect("valid frontmatter");

assert_eq!(metadata.get("title"), Some(&"Hello, world!".to_string()));
assert_eq!(keywords, vec!["rust", "frontmatter", "seo"]);
assert!(tags.primary.contains("description"));

§Examples

Run any example with cargo run --example <name>:

ExampleDemonstrates
lib_exampleHigh-level extract_and_prepare_metadata + meta tag flow
metadata_examplePer-format extraction (YAML, TOML, JSON) + nested mappings
metatags_exampleGenerating + extracting HTML <meta> tags
utils_exampleHTML escape/unescape, async file extraction
error_exampleEvery MetadataError variant + recovery patterns

§YAML frontmatter

use metadata_gen::metadata::extract_metadata;

let content = "---\n\
title: My Post\n\
date: 2026-06-28\n\
author:\n  name: Ada\n  handle: [email protected]\n\
tags:\n  - rust\n  - parsing\n---\n";

let meta = extract_metadata(content).unwrap();
assert_eq!(meta.get("title"),       Some(&"My Post".to_string()));
assert_eq!(meta.get("author.name"), Some(&"Ada".to_string()));
assert_eq!(meta.get("tags"),        Some(&"[rust, parsing]".to_string()));

§TOML frontmatter

use metadata_gen::metadata::extract_metadata;

let content = "+++\n\
title = \"My Post\"\n\
date  = \"2026-06-28\"\n\
\n\
[author]\n\
name = \"Ada\"\n\
+++\n";

let meta = extract_metadata(content).unwrap();
assert_eq!(meta.get("author.name"), Some(&"Ada".to_string()));

§JSON frontmatter

use metadata_gen::metadata::extract_metadata;

let content = "{\
\"title\":\"My Post\",\
\"description\":\"Inline JSON header\"\
}\n# Body";

let meta = extract_metadata(content).unwrap();
assert_eq!(meta.get("title"), Some(&"My Post".to_string()));

Nested JSON objects in the current API: see issue #26 — the v0.0.5 fix uses serde_json::Deserializer to correctly handle balanced braces and arrays of objects.

§HTML meta tag generation

use std::collections::HashMap;
use metadata_gen::metatags::generate_metatags;

let mut map = HashMap::new();
map.insert("description".to_string(), "About the page".to_string());
map.insert("og:title".to_string(),    "Page Title".to_string());
map.insert("twitter:card".to_string(),"summary_large_image".to_string());

let groups = generate_metatags(&map);
assert!(groups.primary.contains("description"));
assert!(groups.og.contains("og:title"));
assert!(groups.twitter.contains("twitter:card"));

§Asynchronous file extraction

use metadata_gen::utils::async_extract_metadata_from_file;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let (metadata, keywords, tags) =
        async_extract_metadata_from_file("post.md").await?;
    println!("title    = {:?}", metadata.get("title"));
    println!("keywords = {:?}", keywords);
    println!("og tags  =\n{}", tags.og);
    Ok(())
}

§Comparisons

CrateYAMLTOMLJSONTyped extractionMeta-tag emitno_std (planned)
metadata-genv0.0.6 roadmapv0.0.9 roadmap
gray_matter
yaml-front-matter
matter

gray_matter is the closest incumbent. metadata-gen differentiates on the bundled meta-tag emitter, the supply-chain posture, and the WASI/no_std roadmap. See the audit deck for the strategic context.

§Performance

Per-call latency on a 2024 reference laptop (M-class CPU, single thread):

TargetInput sizeLatency
extract_metadata (YAML)~200 B~10 µs
process_metadata~200 B~1 µs
generate_metatags~200 B~1 µs
escape_html~80 B~0.3 µs

Run the suite yourself:

cargo bench --bench metadata_benchmark

A 10–100× throughput improvement is planned for v0.0.7 via Cow<'a, str> values, LazyLock<Regex> statics, single-pass HTML escape, and memchr::memmem delimiter scanning. See v0.0.7 for details.

§Supply chain

metadata-gen enforces an explicit supply-chain posture:

  • cargo-deny runs on every PR (advisories, licenses, bans, sources); CI fails on any violation.
  • cargo-audit runs on every PR and on a daily schedule against the RUSTSEC database.
  • #![forbid(unsafe_code)] is enforced crate-wide.
  • First-party 0.0.x dependencies (noyalib, dtt) are pinned strictly in Cargo.toml so an upstream patch cannot break downstream consumers without a deliberate metadata-gen release.
  • SBOM emission (CycloneDX) and cosign signing land in v0.0.5 — see the Roadmap.

Documented advisory exemptions live in audit.toml; each entry carries a rationale referencing the upstream tracking issue.

§MSRV policy

Minimum Supported Rust Version: 1.88.0.

We treat the MSRV as part of the public API: increases are batched into minor (0.x.0) releases and called out in CHANGELOG.md. The current 1.88.0 floor is pinned transitively by dtt 0.0.10 → time 0.3.47 → time-core =0.1.8 (edition2024). Downgrading the floor would re-introduce a medium-severity stack-exhaustion advisory in time, so we hold the line.

If you need an older toolchain, please open an issue describing your constraint — we are happy to discuss MSRV-segmented branches.

§Roadmap

The post-v0.0.4 roadmap is split into six themed releases. Every milestone is tracked on GitHub Milestones with full user stories and acceptance criteria per issue.

VersionThemeHighlights
v0.0.5Foundation HardeningDrop tokio = "full", LazyLock<Regex> statics, fix JSON nested-brace bug, cargo-deny/cargo-audit gating, SBOM emission, rustdoc Actions deploy, README/FAQ overhaul.
v0.0.6Typed API & Ergonomicsextract_typed::<T: Deserialize>, (Metadata, body: &str) return, builder pattern, per-format Cargo features, schema validation.
v0.0.7Zero-copy & PerformanceCow<'a, str> value API, single-pass HTML escape, memchr::memmem scan, throughput benches at 1 KB → 10 MB, Codspeed CI gate.
v0.0.8Correctness & Verificationproptest harness, cargo-fuzz target, Miri in nightly CI, cargo-mutants ≥ 85 % kill, Kani proof, ≥ 98 % coverage gate.
v0.0.9Portabilityno_std + alloc core, async-runtime-agnostic IO, optional Tokio/smol/Embassy adapters, embedded CI matrix.
v0.0.10WASI / Blue Ocean / 1.0 RCwasm32-wasip2 Component with WIT interface, Cloudflare Workers / Spin / wasmCloud guides, PQC-signed metadata, MCP server example, ADR series.

§FAQ

§1. Why three frontmatter formats instead of just YAML?

Real-world content pipelines aren’t homogeneous. Jekyll/Hugo use YAML and TOML; static-site generators built on serde_json prefer JSON; documentation toolchains routinely encounter all three. metadata-gen accepts all three so your downstream code only depends on one crate.

§2. How does this compare to gray_matter?

gray_matter is the dominant frontmatter parser in the Rust ecosystem and has been since 2020. It does typed extraction (via its Pod) today, which metadata-gen will reach in v0.0.6. metadata-gen differentiates on the bundled HTML meta-tag emitter, the documented supply-chain posture, the WASI/no_std roadmap, and the strict pinning of first-party transitive dependencies. If you need typed extraction today, use gray_matter. If you want the v0.0.10 WASI Component, follow this crate.

§3. Do I need an async runtime to use this library?

No. The synchronous entry points (extract_metadata, process_metadata, extract_and_prepare_metadata, generate_metatags, escape_html) do not require Tokio. The async helper async_extract_metadata_from_file is a convenience for callers who already use Tokio; we trim Tokio to its fs + io-util features so it doesn’t bloat your build. A runtime-agnostic AsyncRead boundary lands in v0.0.9.

§4. Can I use metadata-gen in no_std / WASM?

Not in v0.0.5 — regex, scraper, and tokio are all unconditionally pulled. no_std + alloc support is the v0.0.9 milestone, and a full wasm32-wasip2 Component lands in v0.0.10. Track milestones v0.0.9 and v0.0.10 for status.

§5. What is the MSRV policy?

MSRV is part of the public API. Increases happen on 0.x.0 boundaries and are documented in CHANGELOG.md. The current floor (1.88.0) is pinned transitively by a security advisory in time; lowering it would re-introduce the vulnerability.

§6. How are dates parsed?

process_metadata tries, in order:

  1. ISO-8601 / RFC 3339 (2026-06-28, 2026-06-28T15:30:00Z).
  2. YYYY-MM-DD explicit format.
  3. MM/DD/YYYY US format.
  4. DD/MM/YYYY European format (recognised by length + slash pattern).

The output is always normalized to YYYY-MM-DD. Out-of-range or ambiguous inputs return MetadataError::DateParseError.

§7. How do I add custom required fields?

In v0.0.5 the required fields are hard-coded to title and date. A configurable MetadataProcessor builder lands in v0.0.6 (issue #47). Until then, validate your own required fields with metadata.contains_key("…") after extract_metadata.

§8. How is HTML escaping handled?

escape_html maps & < > " ' to their entity equivalents. unescape_html maps them back (plus &#x2F; / &#x2f; to /). The pair is round-trip safe on every ASCII input — a property-test corpus + Kani proof of that invariant land in v0.0.8. The implementation is currently a five-pass str::replace chain; a single-pass rewrite (with optional SIMD via v_htmlescape) ships in v0.0.7 (#52).

§9. How do I extract typed structs (instead of a HashMap<String, String>)?

Not in v0.0.5. The v0.0.6 milestone adds metadata_gen::extract_typed::<T: serde::Deserialize>(content) that preserves typed information (dates as time::Date, integers as integers, nested objects as nested structs). Track issue #45.

§10. Where do I report a vulnerability?

Please do not open a public GitHub issue. Email the maintainer per the SECURITY.md policy. We acknowledge within 48 hours and publish a fix on the most recent stable line. New vulnerability classes trigger a cargo-fuzz target so the same shape can’t reappear.

§11. Will my dependency tree grow when I add this crate?

Less than before. v0.0.5 retired the scraperhtml5everselectorsfxhash / phf_generator chain in favour of a quick-xml-backed <meta> extractor — dropping ~30 transitive crates and silencing RUSTSEC-2025-0057 (fxhash) and RUSTSEC-2026-0097 (rand 0.8 via phf_generator). Remaining runtime crates: tokio (trimmed to fs+io-util), regex, serde, serde_json, noyalib, toml, yaml-rust2, thiserror, quick-xml, time, dtt. Per-format Cargo feature gates land in v0.0.6 (#41) so you can opt out of formats you don’t use.

§12. Is there a CLI?

No. metadata-gen is a library crate. We removed the command-line-utilities category from Cargo.toml in v0.0.5 because no [[bin]] ships. If you want a CLI wrapper, please open a discussion — there is a credible case for a metadata-gen-cli companion crate.

§Contributing

Pull requests welcome. See CONTRIBUTING.md for setup, signed-commit policy, and the issue-template format used across the roadmap.

Quick local loop:

cargo fmt --all
cargo clippy --all-features --all-targets -- -D warnings
cargo test  --all-features
cargo bench --bench metadata_benchmark

§Security

  • Report vulnerabilities per .github/SECURITY.md.
  • The crate enforces #![forbid(unsafe_code)].
  • Supply-chain controls (cargo-deny, cargo-audit, SBOM, cargo-vet audits) are documented in the Supply chain section.

§License

Dual-licensed under Apache 2.0 or MIT, at your option.

Back to top

Re-exports§

pub use error::MetadataError;
pub use metadata::extract_metadata;
pub use metadata::process_metadata;
pub use metadata::Metadata;
pub use metatags::generate_metatags;
pub use metatags::MetaTagGroups;
pub use utils::async_extract_metadata_from_file;
pub use utils::escape_html;

Modules§

error
The error module contains error types for metadata processing. Error types for the metadata-gen library.
metadata
The metadata module contains functions for extracting and processing metadata. Metadata extraction and processing module.
metatags
The metatags module contains functions for generating meta tags. Meta tag generation and extraction module.
utils
The utils module contains utility functions for metadata processing. Utility functions for metadata processing and HTML manipulation.

Functions§

extract_and_prepare_metadata
Extracts metadata from the content, generates keywords based on the metadata, and prepares meta tag groups.
extract_keywords
Extracts keywords from the metadata.

Type Aliases§

Keywords
Type alias for a list of keywords.
MetadataMap
Type alias for a map of metadata key-value pairs.
MetadataResult
Type alias for the result of metadata extraction and processing.