# 4 Technical Step-by-Step Implementation Guide

These steps represent the "Black and White" engineering directives for architects. They are not suggestions but structural requirements for building a system that can be safely operated by machines.

### 4.1 Step 1: Define Service Boundaries (The "One Capability" Rule)

Legacy government systems frequently collapse under the weight of "God Services" - sprawling monoliths that mix unrelated business domains into a single, fragile codebase. To enable automation, we must rigorously decouple these functions.

* **The Rule:** A Building Block must handle exactly one coherent capability. In an agent-based architecture, ambiguity is a failure state. Therefore, a service must own a single domain of logic and data, ensuring that an automated agent never has to guess which part of a system controls a specific outcome.
* **The Test:** To determine if a service is properly scoped, apply a simple linguistic test: Can you describe the service function without using the word "and"? If you cannot, the boundary is likely too broad. For example, a service that "Calculates tax liability" is a valid architectural unit. A service that "Handles user login and processes tax filings" is not. The latter introduces dependencies that make automated orchestration risky and difficult to audit.
* **Action:** The immediate task is to refactor these mixed interfaces. We recognize that completely rewriting a monolithic database is often impossible within a single funding cycle. However, you can still impose order by creating separate API interfaces that sit on top of the legacy data. This approach reduces the "context window" needed for AI understanding. By presenting the AI with a small, specific interface, you reduce the chance of hallucination. In cases where even this interface separation is technically unfeasible, you must rigorously update the API specification and documentation. You need to provide clear, explicit details that logically separate domain functionality, ensuring the machine agent can distinguish between unrelated tasks despite the underlying entanglement.

Domain Driven Design methodology is an approach that often helps with this process.

### 4.2 Step 2: Establish "Trustworthy" Contracts

In an agentic workflow, the API Contract ceases to be mere documentation for developers and becomes the primary product interface for machine agents. It must be strictly machine-consumable to function as a reliable foundation for automation.

* **Standard: OpenAPI 3.1.** We highly recommend adopting OpenAPI 3.1 as the foundational standard because it is inherently well understood by AI models. This version allows for precise schema definitions that minimize ambiguity. By providing a strict mathematical description of the interface, we effectively reduce the "search space" for the agent, making it significantly harder for the software to misinterpret the service boundaries or data requirements.
* **Enums over Strings:** Architects must strictly avoid using free-text string fields for any data that has a finite set of valid values. Instead, you should enforce the use of enumerations (enums). This practice is a critical defense against hallucination; it prevents an AI from inventing plausible but non-existent status codes - such as "Partially\_OK" - when the system logic only supports "Success" or "Failure".
* **Discriminators for Polymorphism:** When a digital service accepts multiple different types of data inputs - such as a single endpoint that handles forms for both "businesses" and "private citizens" - you must include a specific discriminator label in the schema. This label explicitly tells the AI exactly which entity type it is observing, ensuring the agent applies the correct validation rules to the information rather than guessing the structure or becoming confused by ambiguous fields.

### 4.3 Step 3: Implement Structured Error Handling

In a traditional portal, a vague error message might prompt a human user to call support. In an agent-based system, a vague error causes the automation to fail entirely or, worse, to hallucinate a workaround. Therefore, when an AI agent encounters a roadblock, the system must provide explicit, machine-readable instructions on how to resolve it.<br>

* **Standard: RFC 9457.** We recommend adopting RFC 9457 (Problem Details for HTTP APIs) as the mandatory standard for reporting failures. This standard moves beyond simple HTTP status codes to provide a structured document format that an AI can parse and "understand". It is always recommended to provide more details beyond just a status code.
* **Implementation:** To make this effective, the API must return a stable "type" URI - such as /errors/invalid-format - rather than just a text description. Furthermore, the error response must include specific parameters that identify exactly which field caused the issue. This eliminates ambiguity and pinpoints the fault within the data structure.
* **Benefit:** The primary value of this approach is resiliency. By providing structured feedback, we allow the Agent to parse the error logic programmatically. This enables the agent to apply a specific fix - such as reformatting a phone number to match the required pattern - and retry the transaction successfully without ever triggering human intervention.

### 4.4 Step 4: Event-Driven Integration

Life-event orchestration inherently spans time and organizational boundaries, making it incompatible with simple, synchronous commands. Therefore, we must move from a request-response model to an asynchronous event-driven architecture.

* **Standard: CloudEvents 1.0.** We recommend CloudEvents 1.0 as the mandatory specification for this architectural layer. This standard provides a common, vendor-neutral structure for describing event data, ensuring that a notification generated by a legacy on-premise system can be universally understood by a modern cloud-native agent without custom translation logic.
* **Envelope Design:** To maintain system integrity, architects must enforce a strict separation between "Routing Data" and "Business Data". The event envelope contains the metadata required for the infrastructure to deliver the message, while the payload contains the domain-specific content. This distinction ensures that middleware can route traffic efficiently without needing to inspect - or inadvertently expose - the business logic contained within the packet.
* **Privacy and** **the "Claim Check" Pattern.** Automated message streams are often visible to multiple intermediaries, making them unsuitable for transporting sensitive personal information. To protect user privacy, you must implement the "Claim Check" pattern. Instead of sending the actual data - such as a medical record or tax document - in the notification payload, the system simply sends a lightweight alert indicating that the information is ready. The receiving system is then forced to use that token to log in securely and retrieve the actual details. This ensures that private data is never exposed in the open message stream and that every access event is properly authenticated and logged.

### 4.5 Step 5: Legacy Modernization (The Strangler Fig Pattern)

The most persistent fallacy in government IT is the idea that we can simply replace old systems with a "Big Bang" release. In reality, you cannot pause the government to rewrite legacy systems. Critical services must remain operational 24/7. Therefore, modernization must occur incrementally using the "Strangler Fig Pattern" (also known as Strangler Pattern). This strategy allows us to wrap the old system in a new interface and slowly replace the internals over time.

1. **Phase A (The Facade):** The priority is to insulate the new AI agents from the complexity of the old backend. To do this, you build a lightweight Adapter Microservice that acts as a translator. It accepts the messy, outdated protocols of the legacy system - such as SOAP - and converts them into the clean AI-Ready Contract defined in the previous steps. The strategic value here is speed. The AI ecosystem sees a clean API immediately, allowing you to deploy modern agents today even while the underlying database remains decades old.
2. **Phase B (Strangulation):** Once the interface is stabilized, you can begin to replace the actual logic. Build a new microservice dedicated to a specific capability - for example "Update Profile" - and deploy it alongside the legacy environment. You then configure your API Gateway to route traffic for that specific function to the new service rather than the old one. This shifts the processing load to modern infrastructure one piece at a time without disrupting the broader system.
3. **Phase C (Elimination):** The final phase is the safe removal of dead code. You monitor the traffic flows until the usage of the legacy module hits zero. Once you have mathematically confirmed that no user or agent is relying on the old path, you decommission the legacy module entirely. This ensures that technical debt is retired systematically rather than accumulating indefinitely.
