From Static to Dynamic: Making Go Services Reconfigurable at Runtime

From Static to Dynamic: Making Go Services Reconfigurable at Runtime

I've been staring at this problem for weeks. We had a file watching service that worked perfectly - until someone wanted to watch a new directory. The conversation always went the same way.

"Can we watch this new directory without downtime?" "Can we add new file patterns whilst the service is running?" "Can we reconfigure the watcher paths dynamically?"

The answer was always the same: restart the service. That's fine for development, but when you're running production systems where users expect zero-downtime configuration updates, "just restart it" becomes a non-starter.

I'd built this service with a typical Go startup pattern - initialise fsnotify watchers, configure directory paths, start goroutines, and run. The architecture was solid, performant, and followed all the Go best practices. But it had one critical flaw: it was statically configured.

The breakthrough came when I realised this wasn't really about fsnotify or file watching. It was about a fundamental architectural choice: transforming static, startup-configured instances into dynamically reconfigurable ones using event-driven patterns.

Why We Default to Static Configuration

Most Go services follow a predictable lifecycle that feels natural: parse configuration, initialise dependencies, start workers, and run until shutdown. This pattern works brilliantly because it's simple, predictable, and performant. You know exactly what resources you're using from the start.

func main() {
    config := loadConfig()
    watcher := setupFileWatcher(config.Directories)
    server := setupHTTPServer(config.Port)
    
    // Everything's configured, now run forever
    select {}
}

There's beauty in this simplicity. Your service starts up, does its job, and shuts down cleanly. Memory usage is predictable, performance is consistent, and debugging is straightforward because everything is deterministic.

The problem surfaces when business requirements evolve. Suddenly, you need to add new directories to watch, modify file patterns, or adjust monitoring targets - all without interrupting the service. Traditional approaches hit a wall here because most Go libraries and patterns assume configuration happens once at startup.

Think about it: fsnotify watchers are typically initialised with a fixed set of paths. HTTP clients are configured with specific endpoints. Database connections are established with predetermined connection strings. The entire Go ecosystem optimises for this "configure once, run forever" model.

But what happens when "forever" needs to change shape?

The Technical Challenge

The core issue isn't just about adding new configuration - it's about lifecycle management. When you want to watch a new directory dynamically, you're not just calling watcher.Add(). You need to:

  1. Validate the new path exists and is accessible
  2. Integrate it with existing error handling and logging
  3. Coordinate with other components that might care about this change
  4. Handle failures gracefully without affecting existing functionality
  5. Maintain consistency if multiple changes happen simultaneously

Static configuration sidesteps all of this by making everything deterministic at startup. Dynamic configuration forces you to solve these problems at runtime, often whilst other parts of your system are actively using the components you're trying to modify.

I'd tried the obvious approaches first. Adding mutex locks around the file watcher to make it "thread-safe" for configuration changes. Building a configuration reload endpoint that would reinitialise components. Even experimenting with goroutine pools that could be dynamically resized.

None of these felt right. They were all band-aids on a fundamentally static architecture. It reminded me of the challenges I faced when building a production-ready Go package for LLM integration - sometimes you need to step back and rethink the entire approach rather than patching the existing one.

The Event-Driven Breakthrough

The solution came from stepping back and asking a different question: What if configuration wasn't something you set, but something you described?

Instead of telling the FileWatcher "watch these specific directories," what if I told a Repository Manager "these are the repositories I care about" and let the FileWatcher figure out what that meant for its own operations?

This shifts the entire paradigm from imperative configuration (do this specific thing) to declarative state management(achieve this desired state).

Here's the architecture I landed on:

type RepositoryManager struct {
    repositories  map[string]*Repository
    repoAddedCh   chan *Repository
    repoRemovedCh chan *Repository
    listeners     []RepositoryEventListener
    mu            sync.RWMutex
}

type RepositoryEventListener interface {
    OnRepositoryAdded(repo *Repository) error
    OnRepositoryRemoved(repoID string) error
}

The beauty of this approach is separation of concerns. The Repository Manager doesn't need to know about file watching, HTTP servers, database connections, or any specific implementation. It just maintains state and emits events when that state changes.

Components like the FileWatcher become listeners that react to these state changes by updating their own internal configuration. This inversion of control makes the entire system more flexible, testable, and maintainable.

Building the Event System

The event system needed to be robust enough for production use, which meant handling all the edge cases that make distributed systems interesting:

Non-blocking Event Emission: The worst thing that could happen would be for adding a new repository to block because some listener was slow or deadlocked.

func (rm *RepositoryManager) AddRepository(repo *Repository) error {
    rm.mu.Lock()
    rm.repositories[repo.ID] = repo
    rm.mu.Unlock()
    
    // Non-blocking event emission - critical for avoiding deadlocks
    select {
    case rm.repoAddedCh <- repo:
        // Event sent successfully
    default:
        // Channel full, log warning but don't block
        rm.logger.Warn("Repository event channel full, skipping event")
    }
    
    return nil
}

Buffered Channels: I used a buffer size of 100 for repository events. This provides enough headroom for burst activity without consuming excessive memory. The approach here is similar to what I learned when mocking Redis and Kafka in Go - channels become your coordination mechanism, and sizing them properly is critical for avoiding deadlocks.

Graceful Error Handling: If one listener fails, it shouldn't affect others:

func (rm *RepositoryManager) notifyListeners(eventType string, fn func(RepositoryEventListener) error) {
    for _, listener := range rm.listeners {
        if err := fn(listener); err != nil {
            rm.logger.Errorf("Listener failed for %s event: %v", eventType, err)
            // Continue with other listeners
        }
    }
}

The event consumer runs in its own goroutine, processing events and notifying all registered listeners:

func (rm *RepositoryManager) consumeEvents() {
    for {
        select {
        case repo := <-rm.repoAddedCh:
            rm.notifyListeners("added", func(listener RepositoryEventListener) error {
                return listener.OnRepositoryAdded(repo)
            })
        case repoID := <-rm.repoRemovedCh:
            rm.notifyListeners("removed", func(listener RepositoryEventListener) error {
                return listener.OnRepositoryRemoved(repoID)
            })
        }
    }
}

This design ensures that adding a repository is always fast (it just updates local state and sends a channel message), whilst the complex work of updating watchers, installing hooks, or triggering scans happens asynchronously.

Making FileWatcher Dynamic

The FileWatcher transformation was the most interesting part. Instead of being a static component that gets configured once, it becomes an event-driven component that reacts to state changes:

type EnhancedFileWatcher struct {
    fsWatcher     *fsnotify.Watcher
    repoManager   *RepositoryManager
    watchedPaths  map[string]bool
    pathToRepoID  map[string]string
    mu            sync.RWMutex
    logger        Logger
}

func (fw *EnhancedFileWatcher) OnRepositoryAdded(repo *Repository) error {
    fw.mu.Lock()
    defer fw.mu.Unlock()
    
    // Check if we're already watching this path
    if fw.watchedPaths[repo.Path] {
        fw.logger.Debugf("Already watching path: %s", repo.Path)
        return nil
    }
    
    // Add to fsnotify watcher
    if err := fw.addDirectoryToWatcher(repo.Path); err != nil {
        return fmt.Errorf("failed to add directory %s: %w", repo.Path, err)
    }
    
    fw.watchedPaths[repo.Path] = true
    fw.pathToRepoID[repo.Path] = repo.ID
    fw.logger.Infof("Now watching directory: %s", repo.Path)
    
    return nil
}

func (fw *EnhancedFileWatcher) OnRepositoryRemoved(repoID string) error {
    fw.mu.Lock()
    defer fw.mu.Unlock()
    
    // Find the path for this repository
    var pathToRemove string
    for path, id := range fw.pathToRepoID {
        if id == repoID {
            pathToRemove = path
            break
        }
    }
    
    if pathToRemove == "" {
        return nil // Not watching this repository
    }
    
    // Remove from fsnotify watcher
    if err := fw.fsWatcher.Remove(pathToRemove); err != nil {
        fw.logger.Errorf("Failed to remove path from watcher: %v", err)
    }
    
    delete(fw.watchedPaths, pathToRemove)
    delete(fw.pathToRepoID, pathToRemove)
    fw.logger.Infof("Stopped watching directory: %s", pathToRemove)
    
    return nil
}

The critical insight here is that the FileWatcher doesn't manage its own configuration anymore. It simply reacts to events from the Repository Manager. This inversion of control makes the entire system more flexible and testable.

When a new repository is added via API, it flows through the system like this:

  1. API call adds repository to Repository Manager
  2. Repository Manager emits event on channel
  3. Event consumer notifies all listeners
  4. FileWatcher receives notification and adds directory to fsnotify
  5. File changes in that directory are now detected and processed

The latency from API call to active file watching is typically under 100 milliseconds, and most of that is just walking the directory tree to set up the watches.

Performance and Resource Management

Event-driven architectures can introduce latency and complexity, so I focused on keeping the overhead minimal whilst ensuring the system could handle production loads.

Directory Filtering: One of the biggest performance killers is watching directories that change frequently but don't contain useful files. The system automatically excludes these:

func (fw *EnhancedFileWatcher) shouldSkipDirectory(dirPath string) bool {
    base := filepath.Base(dirPath)
    skipDirs := []string{
        "node_modules", ".git", "vendor", "target", "build",
        ".next", "dist", "coverage", ".nyc_output", "tmp",
    }
    
    for _, skip := range skipDirs {
        if base == skip {
            return true
        }
    }
    
    // Skip hidden directories except .git which we already handle
    if strings.HasPrefix(base, ".") && base != ".git" {
        return true
    }
    
    return false
}

Resource Limits: The system has configurable limits to prevent runaway resource consumption:

func (fw *EnhancedFileWatcher) addDirectoryToWatcher(dirPath string) error {
    dirCount := 0
    maxDirs := fw.config.MaxDirectories // Default: 10,000
    
    err := filepath.Walk(dirPath, func(path string, info os.FileInfo, err error) error {
        if err != nil {
            return err
        }
        
        if info.IsDir() {
            if fw.shouldSkipDirectory(path) {
                return filepath.SkipDir
            }
            
            if dirCount >= maxDirs {
                return fmt.Errorf("directory limit exceeded (%d)", maxDirs)
            }
            
            if err := fw.fsWatcher.Add(path); err != nil {
                return fmt.Errorf("failed to add directory %s: %w", path, err)
            }
            
            dirCount++
            fw.logger.Debugf("Watching directory: %s", path)
        }
        
        return nil
    })
    
    if err != nil {
        return err
    }
    
    fw.logger.Infof("Added %d directories to watcher for path: %s", dirCount, dirPath)
    return nil
}

Event Deduplication: File systems can generate duplicate events, especially on network filesystems. The watcher includes a simple deduplication mechanism:

type FileEvent struct {
    Path      string
    Operation string
    Timestamp time.Time
}

func (fw *EnhancedFileWatcher) processFileEventImmediately(event FileEvent) {
    // Simple deduplication with 1-second window
    key := fmt.Sprintf("%s:%s", event.Path, event.Operation)
    
    fw.mu.Lock()
    lastSeen, exists := fw.recentEvents[key]
    if exists && time.Since(lastSeen) < time.Second {
        fw.mu.Unlock()
        return // Duplicate event, skip
    }
    fw.recentEvents[key] = time.Now()
    fw.mu.Unlock()
    
    // Process the actual file change
    fw.handleFileChange(event)
}

Beyond File Watching

Once I'd built this pattern for file watching, I realised it could solve similar problems across our entire stack. The same event-driven approach works for any Go service that needs runtime reconfiguration.

HTTP Client Pools: Instead of creating HTTP clients with fixed endpoints, create them dynamically based on service discovery events:

type EndpointManager struct {
    endpoints   map[string]*Endpoint
    addedCh     chan *Endpoint
    listeners   []EndpointEventListener
}

type HTTPClientPool struct {
    clients map[string]*http.Client
}

func (hcp *HTTPClientPool) OnEndpointAdded(endpoint *Endpoint) error {
    client := &http.Client{
        Timeout: endpoint.Timeout,
        Transport: &http.Transport{
            MaxIdleConns: endpoint.MaxConnections,
        },
    }
    hcp.clients[endpoint.ID] = client
    return nil
}

Database Connection Pools: Add new databases to your connection pool without restarting:

type DatabaseManager struct {
    databases map[string]*DatabaseConfig
    addedCh   chan *DatabaseConfig
    listeners []DatabaseEventListener
}

func (cp *ConnectionPool) OnDatabaseAdded(config *DatabaseConfig) error {
    db, err := sql.Open(config.Driver, config.ConnectionString)
    if err != nil {
        return err
    }
    cp.connections[config.ID] = db
    return nil
}

Monitoring and Alerting: Dynamically add new metrics or alert rules:

type MetricManager struct {
    metrics   map[string]*MetricConfig
    addedCh   chan *MetricConfig
    listeners []MetricEventListener
}

func (ma *MetricAggregator) OnMetricAdded(config *MetricConfig) error {
    ma.collectors[config.ID] = prometheus.NewCounterVec(
        prometheus.CounterOpts{Name: config.Name},
        config.Labels,
    )
    return nil
}

The pattern is consistent: identify what should be state versus what should be behaviour. State gets managed centrally and changes trigger events. Behaviour gets implemented by listeners that react to those events.

Lessons Learned

This transformation taught me several things about building adaptable Go services:

Static Configuration is Often a Choice: Most libraries can be adapted to work with dynamic configuration if you're willing to add a thin event-driven layer. The key is wrapping the static components in dynamic ones rather than trying to modify them directly.

Event-Driven Doesn't Mean Complex: The event system I built is quite simple - just channels, interfaces, and goroutines. The complexity comes from the domain logic, not the infrastructure.

Performance Overhead is Minimal: The event system adds perhaps 1-2 milliseconds of latency to configuration changes. For most use cases, this is completely acceptable compared to the alternative of service restarts.

Testing Becomes Easier: With clear separation between state management and behaviour, testing becomes much more straightforward. You can test the Repository Manager independently of the FileWatcher, and you can test listeners with mock events.

Debugging is More Transparent: Instead of trying to figure out why a service isn't picking up new configuration, you can trace events through the system. Did the repository get added? Was the event emitted? Did the listener receive it? Each step is explicit and logged.

The Results

The transformation was significant. What used to require service restarts now happens seamlessly:

  • Zero downtime for configuration changes
  • Sub-second latency from API request to active file watching
  • Better resource utilisation - no need to over-provision for peak configuration
  • Easier operations - configuration changes through APIs rather than deployment pipelines
  • Better monitoring - explicit events make it easy to track what changed when

But perhaps most importantly, it changed how we think about building services. Instead of asking "how do we configure this at startup," we now ask "how do we make this reconfigurable at runtime." It's a subtle shift that leads to much more flexible architectures.

The next time someone asks if they can update configuration without a restart, you'll have a much better answer than "just restart the service." You'll have a pattern that makes dynamic reconfiguration feel as natural as static configuration, with all the operational benefits that come with it.


Need help with your business?

Enjoyed this post? I help companies navigate AI implementation, fintech architecture, and technical strategy. Whether you're scaling engineering teams or building AI-powered products, I'd love to discuss your challenges.

Learn more about how I can support you.

Get practical insights weekly

Real solutions to real problems. No fluff, just production-ready code and workflows that work.
You've successfully subscribed to Kyle Redelinghuys
Great! Next, complete checkout to get full access to all premium content.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.
Error! Stripe checkout failed.
Success! Your billing info is updated.
Error! Billing info update failed.