Before we dive in: I share practical insights like this weekly. Join developers and founders getting my newsletter with real solutions to engineering and business challenges.
I've been building more and more tools that integrate with Large Language Models lately. From automating git commits using AI to creating a voice assistant using ChatGPT, I found myself writing the same golang llm integration code over and over. Each time I needed robust error handling, retries, and proper connection management for OpenAI's API and other LLM providers. After the third or fourth Go LLM client implementation, I decided to build a proper production-ready package that would handle all of this out of the box.
Why Go for LLM Integration?
I chose Go for this LLM client library for several reasons that became clear during my work at Visa and various fintech startups. When you're building production systems that make thousands of API calls to OpenAI or Anthropic daily, you need:
- Excellent concurrency handling for multiple LLM requests
- Minimal memory overhead for long-running services
- Fast compilation for rapid deployment cycles
- Built-in HTTP client optimisation
The Go language's approach to concurrency makes it particularly well-suited for LLM API integration where you're often waiting on network responses. This is especially important when building AI-powered applications that need to handle hundreds of concurrent requests efficiently.
Core Architecture and Design Philosophy
The golang llm package is built around a few key principles that I've found essential when working with LLMs in production:
- Make integration dead simple
- Support multiple LLM providers out of the box
- Include production-ready features by default
- Provide clear cost visibility
- Handle failures gracefully
Here's what a basic OpenAI integration looks like:
// Complete golang llm integration example
package main
import (
"context"
"log"
"os"
"time"
"github.com/ksred/llm"
)
func main() {
// Initialize the LLM client with OpenAI
client, err := llm.NewClient(
os.Getenv("OPENAI_API_KEY"),
llm.WithProvider("openai"), // Multi-provider support
llm.WithModel("gpt-4"), // Model selection
llm.WithTimeout(30 * time.Second), // Production timeouts
llm.WithRetries(3), // Automatic retries
llm.WithCostTracking(true), // Cost monitoring
)
if err != nil {
log.Fatal("Failed to create LLM client:", err)
}
// Make a chat completion request
resp, err := client.Chat(context.Background(), &types.ChatRequest{
Messages: []types.Message{
{
Role: types.RoleUser,
Content: "Explain Go's concurrency model for financial systems",
},
},
MaxTokens: 150,
})
if err != nil {
log.Fatal("LLM request failed:", err)
}
fmt.Printf("Response: %s\n", resp.Message.Content)
fmt.Printf("Cost: $%.4f\n", resp.Usage.Cost)
}
Simple on the surface, but there's a lot happening underneath. Let's dive into the key components that make this production-ready.
Connection Management: Beyond Basic HTTP Clients
When building services that interact with LLMs, connection management becomes crucial. Every request doesn't need a new connection - that's wasteful and can lead to resource exhaustion. The connection pooling system is built to handle this efficiently:
type PoolConfig struct {
MaxSize int // Maximum number of connections
IdleTimeout time.Duration // How long to keep idle connections
CleanupPeriod time.Duration // How often to clean up idle connections
}
The pool manages connections through several key mechanisms:
Connection Lifecycle Management
The pool tracks both active and idle connections, implementing a cleanup routine that runs periodically:
func (p *ConnectionPool) cleanup() {
ticker := time.NewTicker(p.config.CleanupPeriod)
defer ticker.Stop()
for range ticker.C {
p.mu.Lock()
now := time.Now()
// Remove idle connections that have timed out
// Keep track of active connections
p.mu.Unlock()
}
}
Smart Connection Distribution
When a client requests a connection, the pool follows a specific hierarchy:
- Try to reuse an existing idle connection
- Create a new connection if under the max limit
- Wait for a connection to become available if at capacity
This prevents both resource wastage and connection starvation - crucial when building high-throughput LLM applications in Go.
Robust Error Handling and Retries
LLM APIs can be unreliable. They might rate limit you, have temporary outages, or just be slow to respond. The retry system is designed to handle these cases gracefully:
type RetryConfig struct {
MaxRetries int
InitialInterval time.Duration
MaxInterval time.Duration
Multiplier float64
}
The retry system implements exponential backoff with jitter to prevent thundering herd problems. Here's how it works:
- Initial attempt fails
- Wait for InitialInterval
- For each subsequent retry:
- Add random jitter to prevent synchronisation
- Increase wait time by Multiplier
- Cap at MaxInterval to prevent excessive waits
This means your golang llm client can handle various types of failures:
- Rate limiting (429 responses)
- Temporary service outages (5xx responses)
- Network timeouts
- Connection reset errors
Cost Tracking and Budget Management
One of the most requested features was cost tracking. If you're building services on top of LLMs, you need to know exactly how much each request costs. The cost tracking system provides:
Per-Request Cost Tracking
type Usage struct {
PromptTokens int
CompletionTokens int
TotalTokens int
Cost float64
}
func (ct *CostTracker) TrackUsage(provider, model string, usage Usage) error {
cost := calculateCost(provider, model, usage)
if cost > ct.config.MaxCostPerRequest {
return ErrCostLimitExceeded
}
// Track costs and usage
}
Budget Management
The system allows you to set various budget controls:
- Per-request cost limits
- Daily/monthly budget caps
- Usage alerts at configurable thresholds
- Cost breakdown by model and provider
This becomes critical when you're running at scale. I've seen services rack up surprising bills because they didn't have proper cost monitoring in place. With this system, you can:
- Monitor costs in real-time
- Set hard limits to prevent runaway spending
- Get alerts before hitting budget thresholds
- Track costs per customer or feature
Streaming Support: Real-time Responses
Modern LLM applications often need streaming support for better user experience. The package includes robust streaming support:
streamChan, err := client.StreamChat(ctx, req)
if err != nil {
return err
}
for resp := range streamChan {
if resp.Error != nil {
return resp.Error
}
fmt.Print(resp.Message.Content)
}
The streaming implementation handles several complex cases:
- Graceful connection termination
- Partial message handling
- Error propagation
- Context cancellation
Common Golang LLM Integration Patterns
Through building various LLM-powered applications in Go, I've identified several patterns that work particularly well:
Request Batching for Cost Efficiency
type BatchProcessor struct {
client *llm.Client
requests chan *BatchRequest
results chan *BatchResult
}
func (bp *BatchProcessor) ProcessBatch(ctx context.Context, requests []string) ([]string, error) {
// Implementation that batches multiple prompts efficiently
// Reduces API calls and costs
var results []string
for _, req := range requests {
result, err := bp.client.Chat(ctx, &types.ChatRequest{
Messages: []types.Message{{Role: types.RoleUser, Content: req}},
})
if err != nil {
return nil, err
}
results = append(results, result.Message.Content)
}
return results, nil
}
Streaming with Goroutines
func (c *Client) StreamChatWithCallback(ctx context.Context, req *ChatRequest, callback func(string)) error {
streamChan, err := c.StreamChat(ctx, req)
if err != nil {
return err
}
go func() {
for resp := range streamChan {
if resp.Error != nil {
// Handle streaming errors
return
}
callback(resp.Message.Content)
}
}()
return nil
}
This pattern is particularly useful when building real-time applications that need immediate feedback to users while processing continues in the background.
Performance Metrics and Monitoring
Understanding how your LLM integration performs is crucial. The package includes comprehensive metrics:
Request Metrics
- Request latency
- Token usage
- Error rates
- Retry counts
Connection Pool Metrics
- Active connections
- Idle connections
- Wait time for connections
- Connection errors
Cost Metrics
- Cost per request
- Running totals
- Budget utilisation
- Cost per model/provider
Provider Management
The package currently supports multiple LLM providers:
OpenAI
- GPT-3.5
- GPT-4
- Text completion models
Anthropic
- Claude
- Claude Instant
Each provider implementation handles its specific quirks while presenting a unified interface to your application. This means you can switch between providers without changing your application code.
Real-World Applications
I've used this package in several production applications:
Fintech Document Processing
At previous fintech roles, I've used similar Go LLM integration patterns for:
- Automated contract analysis with GPT-4
- Risk assessment document summarisation
- Compliance report generation
The key requirement was reliable, cost-controlled API integration that could handle thousands of documents daily. The connection pooling and cost tracking features were essential for keeping operations efficient and predictable.
ChatGPT Integration for Customer Support
Built a customer support system using this Go package that:
- Processes 10,000+ support queries daily
- Maintains sub-200ms response times
- Tracks costs per customer interaction
- Handles ChatGPT API rate limits gracefully
Interactive Chat Applications
Real-time chat applications requiring:
- Streaming responses
- Low latency
- Error resilience
Batch Processing Systems
Large-scale document processing using:
- Multiple providers
- Budget management
- Detailed usage tracking
Testing and Mocking
When building production systems, testing becomes crucial. Similar to my approach with mocking Redis and Kafka in Go, this package includes comprehensive testing utilities:
// Mock LLM client for testing
mockClient := llm.NewMockClient()
mockClient.SetResponse(&types.ChatResponse{
Message: types.Message{Content: "Mocked response"},
Usage: types.Usage{TotalTokens: 100, Cost: 0.002},
})
// Use in tests
resp, err := mockClient.Chat(ctx, req)
assert.NoError(t, err)
assert.Equal(t, "Mocked response", resp.Message.Content)
What's Next
While the package is already being used in production, there's more to come:
Short Term
- Enhanced cost tracking across different pricing tiers
- Better model handling and automatic selection
- Support for more LLM providers
- Improved metrics and monitoring
Long Term
- Automatic provider failover
- Smart request routing
- Advanced budget controls
- Performance optimisation tools
Frequently Asked Questions
Q: How does this compare to other Go LLM libraries? A: This package prioritises production readiness with built-in cost tracking, connection pooling, and multi-provider support that I haven't found elsewhere.
Q: Can I use this with Azure OpenAI? A: Yes, the package supports multiple OpenAI-compatible endpoints including Azure's implementation.
Q: How accurate is the cost tracking? A: Cost tracking uses the official pricing from each provider and accounts for both prompt and completion tokens.
Q: Does it support streaming for all providers? A: Currently streaming is supported for OpenAI and compatible APIs. Anthropic streaming support is coming soon.
Best Practices and Tips
From my experience using this package in production, here are some recommendations:
- Start with conservative retry settings - You can always increase them based on your needs
- Monitor your token usage closely - Set up alerts well before hitting limits
- Set up budget alerts well below your actual limits - This gives you time to react
- Use streaming for interactive applications - Users expect immediate feedback
- Implement proper error handling in your application - The package handles retries, but your app should handle final failures gracefully
Conclusion
Building this golang llm package has significantly simplified my LLM integrations. Instead of rewriting the same boilerplate code for each project, I can focus on building the actual features I need. If you're working with LLMs in Go, feel free to check out the package and contribute.
Like my approach to building systems, this is open source and available for anyone to use and improve. The more we can standardise these patterns, the better our LLM integrations will become.
The future of LLM integration is about making these powerful tools more accessible and reliable. With proper abstractions and production-ready features, we can focus on building innovative applications instead of worrying about the underlying infrastructure. Whether you're building the next generation of AI-powered fintech applications or simple automation tools, having a robust foundation for LLM integration is essential.