Production Proof

The body of measurable engineering evidence backing SuiteCentral 2.0’s “production-ready” claim.

What it is

The set of objective, verifiable signals that SuiteCentral 2.0 is not vaporware: passing test suites, broad coverage, real connectivity proofs, multi-provider AI redundancy.

Why it matters (to the adoption case)

Squire’s CTO and CFO will not approve a pilot for software that hasn’t been engineered carefully. Production proof is what converts “interesting product story” into “responsible technology bet.” 01-executive-summary places it on slide 4 — between the differentiation pitch and the Squire-specific value framing — which signals it’s load-bearing for the executive case.

The numbers

Note: the slide-script numbers below are from a presentation created circa late 2025/early 2026. The Preston-Test repo README.md (current as of April 2026) shows higher numbers. Both are accurate snapshots in time — the slide-script numbers are what executives saw in the presentation; the README numbers are what’s true today. Don’t conflate.

Per 01-executive-summary slide 4 and 11-role-brief-cto (slide-script vintage)

100% suite pass rate: 392/392 suites
100% executed-test pass rate: 9,038/9,038
9,038/9,061 total: 23 intentionally skipped (confirmed across two sources)
95%+ AI accuracy (per 11-role-brief-cto only — methodology not yet ingested)
Multi-provider AI stack (asserted as production-validated)
NetSuite sandbox connectivity (asserted as proven)
Failure-path visibility and fallback handling (per 11-role-brief-cto — see cto)

Per read-talking-points (TALKING-POINTS vintage, formally ingested 2026-04-07)

100% suite pass rate: 392/392 suites
9,207/9,237 tests passing: 30 tests intentionally skipped (the Talking-Points vintage skip baseline)
Six production connectors (not a test number, but part of the “what’s proven” talking point)

Per 15-start-here-async-standalone (CURRENT vintage, formally re-baselined 2026-06-12)

100% suite pass rate: 607/607 suites
12,254/12,270 tests passing: 16 intentionally skipped
68.73% statement coverage
2,282 tracked files / ~854K text LOC repository scale snapshot

These match the Preston-Test repo README.md and are the numbers a reviewer sees on the live demo site Start Here page.

Canonical test breakdown (per 26-canonical-metrics-and-wording + 04-technical-proof)

The canonical way to cite the test counts per the style guide:

100% suite pass rate (607 suites)
100% of executed tests passed (12,254 tests across unit, integration, and E2E)
Full breakdown: 11,731 unit (0 skipped), 503 integration (16 skipped), 20 E2E portal

Math check: 11,731 + 503 + 20 = 12,254 passing. 0 + 16 + 0 = 16 skipped (all skipped tests are in integration, none in unit or E2E). 12,254 + 16 = 12,270 total. Perfect reconciliation.

Canonical coverage (per 26-canonical-metrics-and-wording)

Metric	Value
Statements	68.73%
Branches	57.69%
Functions	71.29%
Lines	68.98%

Short form per the style guide: “69% line coverage across 53,903 lines of production TypeScript.” The 53,903 number is the production TypeScript subset — the 15-start-here-async-standalone page’s “~854K text LOC” figure is the total repo (code + tests + config + docs).

Four production AI providers (per 04-technical-proof + 26-canonical-metrics-and-wording)

The multi-provider AI stack is now fully enumerated:

Provider	Model (primary)	Role	Cost per mapping
OpenAI	GPT-5.4 mini (default; earlier sources cited GPT-4o)	Primary inference	~$0.0007 (benchmark-measured)
Anthropic Claude	Claude Haiku 4.5 (Sonnet 4.6 upgrade tier; earlier env config showed `claude-sonnet-4-5-20250929`)	Secondary / validation	~$0.0011 (benchmark-measured)
OpenRouter	Multi-model	Routing / fallback	Free tier available
LMStudio	Llama 3.1 8B	On-premise / fallback	Free (local)

All four operational per the technical proof document. Per-mapping costs are measured by the Phase-B accuracy benchmark matrix (2026-06-10 live run — Claude Haiku 4.5 measured 96.7% top-1 on the NetSuite pair, edging out GPT-5.4 mini’s 95.1%). The $0.003/mapping shown in oracle-comparison’s live demo is a legacy Claude 3.5 Sonnet-era price.

Canonical OpenAI model list (per `src/services/ai/ModelCatalogService.ts`)

The actual source-of-truth capability matrix in the code lists these OpenAI models:

Model	Context window	Vision	JSON mode	Tool use	Reasoning
gpt-4o	128K	✓	✓	✓	✓
gpt-4o-mini	128K	✓	✓	✓	—
gpt-4.1	128K	✓	✓	✓	✓

Note: ai-provider-system (the AI Provider System doc) lists older OpenAI models (GPT-4, GPT-4 Turbo, GPT-3.5 Turbo) — that document is stale and predates the gpt-4o upgrade. The canonical list is ModelCatalogService.ts; 04-technical-proof (March 3, 2026) already reflects the current gpt-4o primary.

Canonical Claude model list

Model	Context window
claude-3-5-sonnet-20241022	200K
claude-3-opus-20240229	200K
claude-3-haiku-20240307	200K

Also from ModelCatalogService.ts. Consistent with ai-provider-system.

The 7-provider total (per ai-provider-system)

In addition to the 4 production providers above, the code supports 3 more:

Grok (xAI) — experimental. Models: grok-beta, grok-vision-beta.
Gemini (Google) — experimental. Models: gemini-1.5-flash (1M context), gemini-1.5-pro (2M context).
Rule-Based Engine — deterministic fallback. 78% accuracy baseline; no external API calls.

Canonical wording per 26-canonical-metrics-and-wording is “4 production-ready AI providers” — the 3 extras are categorized as experimental or fallback and should not be counted in pitch materials.

AI accuracy — measured signals (per 04-technical-proof)

Metric	Last-measured Signal	When
Field Mapping Accuracy	95.1% top-1 (GPT-5.4 mini) / 96.7% (Claude Haiku 4.5) on SFDC→NetSuite, 100% both providers on SFDC→Business Central, 0 hallucinations — Phase-B fixture benchmark matrix (`npm run benchmark:ai`), disclosed as a fixture figure, not a production number	Jun 2026
Confidence Calibration	90%+	Oct 2025
Multi-provider Consensus Boost	+5-15%	Oct 2025

The field-mapping accuracy row is intentionally qualified — the canonical 04-technical-proof removed an earlier unqualified-percentage-range framing because the absolute upper bound was a single-source point estimate, not a reproducible measurement. Once a benchmark harness ships (tracked as a Tier-3 follow-up in the source-of-truth repository’s A-grade remediation plan), the absolute number returns to the table sourced from a CI-emitted artifact. Confidence calibration of 90%+ means the AI’s self-reported confidence scores match the actual correctness rate. The multi-provider consensus boost is the architectural payoff for running four providers — they vote on ambiguous mappings.

NetSuite integration proof (per 04-technical-proof)

Squire’s actual NetSuite sandbox: TSTDRV2698307 — first concrete Squire-specific infrastructure identifier in the corpus
Auth: OAuth 1.0 HMAC-SHA256
Connector source: src/connectors/NetSuiteConnector.ts (500+ LOC)
Verified CRUD: Customer records, Vendor records, Transaction records, Custom record types, Saved searches — full Create / Read / Update / Delete / Search on all five

SOC 2 Trust Services Criteria mapped to production code (per compliance-dashboard)

All 5 TSC categories are implemented and each is backed by specific source files where applicable. See compliance-dashboard for the full mapping. Summary:

CC6 Security: JWT auth, RBAC, timing-safe key validation, rate limiting, production guards
A1 Availability: Health checks, circuit breakers, DR with RTO/RPO, Kubernetes auto-scaling (2-10 replicas)
PI1 Processing Integrity: AI confidence scoring, hallucination detection, schema drift blocking (SCHEMA_DRIFT_BLOCKED result code), DB-persisted reasoning traces
C1 Confidentiality: DLP/PII detection (10 patterns per the actual code — see DLP reconciliation below), masking utility, encrypted credential storage
P1 Privacy: GDPR/CCPA compliance, audit trail logging, 90-day default data retention

DLP pattern count — reconciled from source code and dashboard HTML

The PII detection surface spans two subsystems and the compliance dashboard dynamically reports a combined count:

DLPService.ts (src/services/security/DLPService.ts, lines 53-65) — 10 regex patterns:

#	Pattern name	What it matches
1	`ssn`	Social Security Numbers (3-2-4 format or 9 digits)
2	`creditCard`	16-digit credit cards (4-4-4-4)
3	`email`	Email addresses
4	`phoneUS`	US phone numbers
5	`phoneIntl`	International phone numbers
6	`medicalRecordNumber`	MRN / Medical Record # patterns
7	`accountNumber`	Account # patterns (8-17 digits)
8	`ipAddress`	IPv4 addresses
9	`apiKey`	Generic API keys (32+ alphanumeric)
10	`jwt`	JWT tokens

GovernanceService.ts (src/services/ai/orchestrator/GovernanceService.ts, lines 381-398) — 6 content-filter patterns (partial overlap with DLPService):

ssn, email, phone, credit_card, ip_address, name (title-prefix name detection)

Compliance dashboard (public/compliance-dashboard.html, lines 375-378) — the JavaScript snapshot renders 14 patterns when the page loads in unauthenticated/demo mode:

SSN, credit card, email, phone, intl phone, medical record, IP address, API key, JWT, bank account, DOB, passport, driver’s license, name

When authenticated, the dashboard fetches live from /api/compliance/dlp-patterns and replaces the snapshot with the API’s real-time count. A [snapshot] badge appears to distinguish snapshot mode from live API data.

Reconciliation:

10 of the 14 snapshot items have confirmed regex implementations in DLPService.ts
name is implemented in GovernanceService.ts (11th confirmed)
DOB, passport, driver’s license — not found as regex patterns in either service file. These three may represent planned additions, patterns behind the /api/compliance/dlp-patterns endpoint at runtime, or the design-intent target that the snapshot was written to reflect. A CTO who needs to verify can check the API endpoint directly.
The “8 patterns” figure that appeared in earlier wiki source summaries came from a stale NotebookLM web scrape of the compliance dashboard. NotebookLM’s extraction captured a pre-Alpine.js render state with different content than the actual page. The repo HTML source has the 14-pattern snapshot, not 8. Earlier wiki claims of “the dashboard is lying” have been corrected.

Reconciliation with source summaries:

04-technical-proof says “9 patterns” — counts phones as one, omits GovernanceService patterns. Reasonable approximation of the 10 DLPService patterns.
compliance-dashboard — the NotebookLM scrape originally showed “8 patterns” but this was a scrape artifact. The actual repo HTML says 14. Updated.
oracle-comparison — also showed “8 patterns” from the same scrape vintage. Corrected to note the actual snapshot says 14.

Vintage comparison table

Vintage	Suites	Tests passing	Skipped	Total	Sources
Slide	379/379	9,038	23	9,061	01-executive-summary, 11-role-brief-cto
Talking-Points	404/404	9,207	30	9,237	read-talking-points, read-elevator-pitch
Current	607/607	12,254	6	12,270	15-start-here-async-standalone

Trajectory between vintages

Slide → Talking-Points: +25 suites, +169 passing tests, +7 skipped tests (the skipped count rose from 23 to 30 across this vintage)
Talking-Points → Current: +58 suites, +917 passing tests, -24 skipped (the skip count dropped from 30 to 6 between vintages — the PR #694/#695 skip-discipline cleanup pruned long-stale it.skip placeholders)
Full arc: +83 suites and +1,086 passing tests between slide and current

Three observations: (1) the test base has grown consistently across three measured points in time — this is a real codebase with active engineering, not a static pitch deck; (2) the skip count went down from 30 (Talking-Points) to 6 (Current), reflecting an active prune of stale it.skip placeholders rather than the earlier “frozen list” pattern; (3) coverage is reported only in the Current vintage — earlier snapshots optimize for “100% pass rate” (an easier executive number) over coverage percent.

Mixed-vintage caveat: a Path B reviewer will see three different test counts depending on which page they land on. Start Here has Current; Leadership Talking Points has Talking-Points vintage; the CTO role brief has Slide vintage. Same package, three vintages. Worth flagging to the asset owner — see demo-site.

What this proves

The engineering organization can ship. Many teams claim “production-ready” with sub-1k test counts; 9k+ is materially different.
The system has been exercised in real conditions. NetSuite sandbox connectivity is not an in-memory simulation.
The AI provider abstraction works. Multi-provider stack means no single-vendor dependency.

What this does NOT prove

The 68.73% coverage figure means 35%+ of statements are uncovered. Squire’s CTO will likely ask which subsystems are under-covered. Open question for next technical ingest.
“100% pass rate” excludes the 23 skipped tests — what are they, why are they skipped? Not in the slide script. Open question.

Open questions

Where is the coverage gap? (Which modules / subsystems are under-covered?)
What are the 23 skipped tests, and is there a plan to enable them?
What does the multi-provider AI stack actually consist of? (Probably OpenAI + Claude + OpenRouter + LMStudio per the README, but not confirmed from a formally-ingested source yet.)
What does “NetSuite sandbox connectivity proof” mean concretely — read-only metadata? Two-way write tests? Auth round-trips?
What is the 95%+ AI accuracy measuring? Now PARTIALLY answered by ai-governance-layer-video (01:32): “We reduced manual field mapping from 15 hours to 30 seconds with 95% accuracy.” So the 95% is specifically about field-mapping accuracy, not AI accuracy generally. Still single-task / single-source for methodology; needs 04-TECHNICAL-PROOF.md or AI Provider System Documentation for evaluation-harness detail.
Field mapping efficiency claim: 15 hours → 30 seconds is a dramatic efficiency claim. At face value that’s a ~1,800× speed-up. The 15-hour baseline matches read-elevator-pitch’s “three years ago our problem was manual mapping, with about 15 hours of labor per integration.” The 30-second target is from ai-governance-layer-video. The ratio is what makes the “per-integration” economics work for the HintonBurdick-driven client-base doubling.

Sources

01-executive-summary — claims 2, 6, 7 (test counts slide-vintage, multi-provider stack, NetSuite connectivity)
11-role-brief-cto — second-source confirmation of 9038/9061 slide-vintage; also claims (slide-vintage, single-source) “95%+ AI accuracy” — the canonical 04-technical-proof now describes field-mapping accuracy as qualified (“measurably improved across Phases 1-5; absolute numbers depend on schema complexity and are tracked as a Tier-3 follow-up — benchmark harness”), so treat this 95%+ figure as a single-source slide-vintage data point until the Tier-3 harness ships
15-start-here-async-standalone — claims 12-15 (CURRENT-vintage 12,254/12,270, 100% 607/607 suites, 68.73% coverage, 2,282 files / ~854K LOC)
read-talking-points — claim 4 (TALKING-POINTS-vintage 9,207/9,237, 392/392 suites, 16 skipped) and claim 5 (six production connectors)
read-elevator-pitch — claim 7 (second-source confirmation of Talking-Points vintage test counts)
ai-governance-layer-video — claims 3, 12 (video claims 95% mapping accuracy — single-source point estimate that the canonical 04-technical-proof now qualifies as “measurably improved… tracked as Tier-3 follow-up”; also 9,000+ tests; 15-hours-to-30-seconds efficiency quantification)
04-technical-proof — all claims re: canonical test breakdown, 9 AI providers with model names, AI accuracy (qualified field-mapping accuracy per Tier-3 benchmark roadmap, 90%+ confidence calibration, +5-15% consensus boost), NetSuite sandbox TSTDRV2698307, full CRUD verified, line coverage 68.98%
26-canonical-metrics-and-wording — canonical test sequence, 4 coverage metrics (statements / branches / functions / lines), 53,903 lines of production TypeScript, AI provider per-mapping costs
compliance-dashboard — 5 SOC 2 Trust Services Criteria mapped to production code with source file paths

Brain1 — SuiteCentral 2.0 Wiki

Explorer

Production Proof

Production Proof

What it is

Why it matters (to the adoption case)

The numbers

Per 01-executive-summary slide 4 and 11-role-brief-cto (slide-script vintage)

Per read-talking-points (TALKING-POINTS vintage, formally ingested 2026-04-07)

Per 15-start-here-async-standalone (CURRENT vintage, formally re-baselined 2026-06-12)

Canonical test breakdown (per 26-canonical-metrics-and-wording + 04-technical-proof)

Canonical coverage (per 26-canonical-metrics-and-wording)

Four production AI providers (per 04-technical-proof + 26-canonical-metrics-and-wording)

Canonical OpenAI model list (per `src/services/ai/ModelCatalogService.ts`)

Canonical Claude model list

The 7-provider total (per ai-provider-system)

AI accuracy — measured signals (per 04-technical-proof)

NetSuite integration proof (per 04-technical-proof)

SOC 2 Trust Services Criteria mapped to production code (per compliance-dashboard)

DLP pattern count — reconciled from source code and dashboard HTML

Vintage comparison table

Trajectory between vintages

What this proves

What this does NOT prove

Open questions

Sources

Graph View

Table of Contents

Backlinks

Brain1 — SuiteCentral 2.0 Wiki

Explorer

Production Proof

Production Proof

What it is

Why it matters (to the adoption case)

The numbers

Per 01-executive-summary slide 4 and 11-role-brief-cto (slide-script vintage)

Per read-talking-points (TALKING-POINTS vintage, formally ingested 2026-04-07)

Per 15-start-here-async-standalone (CURRENT vintage, formally re-baselined 2026-06-12)

Canonical test breakdown (per 26-canonical-metrics-and-wording + 04-technical-proof)

Canonical coverage (per 26-canonical-metrics-and-wording)

Four production AI providers (per 04-technical-proof + 26-canonical-metrics-and-wording)

Canonical OpenAI model list (per src/services/ai/ModelCatalogService.ts)

Canonical Claude model list

The 7-provider total (per ai-provider-system)

AI accuracy — measured signals (per 04-technical-proof)

NetSuite integration proof (per 04-technical-proof)

SOC 2 Trust Services Criteria mapped to production code (per compliance-dashboard)

DLP pattern count — reconciled from source code and dashboard HTML

Vintage comparison table

Trajectory between vintages

What this proves

What this does NOT prove

Open questions

Sources

Graph View

Table of Contents

Backlinks

Canonical OpenAI model list (per `src/services/ai/ModelCatalogService.ts`)