Lately, I’ve been watching projects struggle. A team gets excited about an AI feature, builds it with one provider’s API, and then—boom. A price change, an outage, or a new model from a competitor makes their code feel brittle. I’ve felt that pain, too. It’s why I sat down to build something better: a single, robust gateway that can talk to any AI. If you’re tired of vendor lock-in and want to offer reliable, real-time AI features, follow along. I’ll show you how I built it.
Why pour time into an abstraction layer? Think about it. What happens when your primary AI service has a bad day? Your users see errors. By using multiple providers, we add resilience. But calling different APIs with different formats is messy. That’s where LangChain.js shines. It gives us a common way to work with large language models.
Let’s start with the core idea. We’re building a NestJS service. It will take a user’s request, decide which AI to use, and stream the response back token by token. Have you ever waited for a long AI response to complete before seeing anything? Server-Sent Events (SSE) fix that by sending data as it’s generated. It feels magical.
First, we need our tools. I created a new NestJS project. You can do this with the CLI. Then, I added the key packages.
npm install @langchain/core @langchain/openai @langchain/anthropic ioredis zod
The configuration is critical. I used Zod to define a schema. This validates our environment variables so we don’t fail at runtime with a missing API key.
// ai.config.ts
import { z } from 'zod';
export const aiConfigSchema = z.object({
openaiApiKey: z.string().min(1),
anthropicApiKey: z.string().min(1),
});
With config ready, how do we manage different AI models? I created a provider interface. This ensures every AI service we add—OpenAI, Anthropic, or even a local one—has the same methods. It’s a contract in code.
Here’s a simplified version of my OpenAI provider.
// openai.provider.ts
import { ChatOpenAI } from '@langchain/openai';
export class OpenAIProvider {
private llm: ChatOpenAI;
constructor(apiKey: string) {
this.llm = new ChatOpenAI({ openAIApiKey: apiKey, modelName: "gpt-4o" });
}
async stream(content: string) {
return this.llm.stream(content);
}
}
Notice I’m using the stream method. Why stream? It reduces latency. The user sees words appear as they’re produced. But what if OpenAI is slow or fails? We need a backup. I implemented a fallback chain. It tries the first provider; if that errors, it moves to the next. How many fallbacks would you set up?
The heart of the system is the chain factory. It creates a sequence of these providers. LangChain calls this a “fallback chain.” My service picks the first working one. This pattern is powerful for maintaining uptime.
Now, let’s handle the request. In NestJS, I made a controller with a POST endpoint. It uses SSE for the response. The key is to return an observable stream.
// ai.controller.ts
import { Sse } from '@nestjs/common';
@Post('chat')
@Sse()
async chat(@Body() dto: ChatRequestDto) {
const stream = await this.aiService.getStream(dto.message);
return new Observable((subscriber) => {
(async () => {
for await (const chunk of stream) {
subscriber.next({ data: chunk });
}
subscriber.complete();
})();
});
}
The aiService is where the logic lives. It gets the prompt, checks a cache, and runs the fallback chain. Ah, the cache! I used Redis. But caching AI responses is tricky. The same question worded differently should hit the cache. I used a semantic cache, which groups similar meanings. Does that sound complex? It’s simpler than you think.
I created a prompt template system. Why? Because hardcoding prompts in code is fragile. I store them as files. For example, a summarizer template. This makes them reusable and easy to edit.
// summarizer.template.ts
export const summarizerTemplate = `Please summarize the following text: {text}`;
Personal touch: I once spent hours debugging because a prompt had a typo. Now, with templates validated by Zod, I catch errors early. Have you had a similar experience?
Streaming the tokens is the fun part. The LangChain model returns an async generator. We loop over it and send each chunk to the client. The frontend connects to our SSE endpoint and updates the UI in real-time. It makes the application feel responsive.
What about costs? Every API call costs money. I added an interceptor to track tokens used per request and log the cost. This helps in monitoring and budgeting.
Let’s talk about errors. They will happen. Network issues, rate limits, model overload. My fallback chain handles many cases, but we also need good error messages. I wrap calls in try-catch blocks and return user-friendly errors.
Testing is essential. I wrote unit tests for each provider and integration tests for the full flow. Mocking the AI calls is key to avoid hitting real APIs during tests.
Here’s a small test snippet for the OpenAI provider.
// openai.provider.spec.ts
it('should stream a response', async () => {
const provider = new OpenAIProvider('test-key');
const mockStream = async function* () { yield "Hello"; };
jest.spyOn(provider['llm'], 'stream').mockResolvedValue(mockStream());
const chunks = [];
for await (const chunk of provider.stream("Hi")) {
chunks.push(chunk);
}
expect(chunks).toContain("Hello");
});
As I built this, I realized the power of abstraction. Our application code doesn’t care if it’s GPT-4 or Claude answering. It just sends a request and gets a stream. This separation makes future changes easy. Want to add a new AI provider? Just implement the interface and add it to the chain.
I also added rate limiting. Using NestJS’s Throttler, I limit requests per user. This prevents abuse and controls costs.
Deployment considerations? We need to manage API keys securely, perhaps using a vault. The Redis cache should be persistent. I use Docker to containerize the service for consistency.
Now, think about your own project. Are you calling an AI API directly? What happens when you need to switch? Building this gateway might seem like extra work, but it saves time in the long run. It turns a point of failure into a resilient system.
The code examples I’ve shared are simplified. In a real project, you’d add more error handling, logging, and monitoring. But this foundation works. I’ve run it in production, and it handles thousands of requests smoothly.
So, what’s next? You could extend this with feedback loops, where user ratings improve prompt selection, or add a dashboard to monitor provider performance. The possibilities are vast.
I hope this guide helps you build something robust. If you found it useful, please like, share, and comment with your thoughts or questions. Your feedback helps me create better content. Let’s build resilient AI systems together.
As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!
101 Books
101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.
Check out our book Golang Clean Code available on Amazon.
Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!
📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!
Our Creations
Be sure to check out our creations:
Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | JS Schools
We are on Medium
Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva