When I was at the Emergent Ventures Unconference and speaking to the amazing people there, a thought entered my mind. What if software was no longer static, but something living? If it improved itself, came up with features based on usage, and continually evolved with input?

I spoke to several experts in AI and they agreed this was theoretically possible. It would be difficult to do with the current tooling, but based on the growth of AI capabilities this should be possible within a three to six months. There have been some projects looking to do this in the wild, but none of them quite reached the depth of what I was looking to do.

In this series of posts, I will go through the process of using ChatGPT to help me create the program that will create new programs based on Natural Language. This program in turn would use the ChatGPT4 API to generate, test and compile code. The first use case will be a net new program and subsequent use cases will take existing code bases and improve on them in various ways.

Evolve Project Overview

This project is called Evolve (suggested by ChatGPT) as it focuses on turning software into a living entity. The idea would be to eventually replace software engineers, QA testers and product managers. Of course, there will be human intervention needed, but as the tools improve the need for manual steps in the process will be reduced.

This was my opening prompt:

You are an expert software engineer with deep knowledge of all aspects of engineering. You will help me write a program as described below, giving enough thought to ensure the program works as anticipated.

I want to write a program in Go. The program I am looking to write works as follows. Using the OpenAI GPT4 API (which is currently available) I need a program that given an English description of a project, inits a new git repository, writes code for that project, writes the code to a file, writes tests and benchmarks, ensures the code compiles and tests pass, and then commits the code with a relevant message. The code should be recompiled and tested until it works. The commit message should be descriptive. Comments should be added to functions. It should also collect metrics on the timing of functions, as well as the usage of them. These should be stored for future use where these metrics can be read and suggestions can be made.

The program should have a relevant system prompt, be able to write the code to disk, have a loop to ensure testing and compilation works (and have a set number of retries).

I decided to start with the simplest version of software I could think of, and was suggested a To Do List application. This has a sufficient degree of complexity to not be an intellectual exercise alone, and is small enough to serve as the initial test case.

Using Language Processing (NLP) and OpenAI GPT-4 API

In order to make this as easy as possible, the program should take a project name and description as input. For now, this is the only input:

Project: SimpleTodoAPI
Description: Create a RESTful API using Go for a simple todo list. The API should support basic CRUD (Create, Read, Update, Delete) operations for todo items. Each todo item should have an ID, a description, a status (e.g., 'pending', 'completed'), and timestamps for when it was created and last updated.

As suggested by ChatGPT, the flow is as follows:

  1. Interact with OpenAI's GPT-4 API: Write a program that can receive a description of a project and pass it to the GPT-4 API.
  2. Parse the GPT-4 response: Analyze the returned code from the GPT-4 API. This might involve extracting the necessary pieces of information (function names, variable names, etc.) from the code.
  3. Initialize a Git repository: This involves creating a new folder, initializing a Git repository in that folder, and setting up the necessary Git configuration (user name, user email, etc.).
  4. Write code to disk: Once the code is generated, it needs to be written to a .go file in the Git repository.
  5. Compile and test the code: Compile the code using the Go compiler, and run tests to ensure the code works as expected. This might involve generating test cases, which would be a complex task in itself.
  6. Commit the code to the repository: If the code compiles and passes all tests, commit it to the Git repository. The commit message could be based on the project description.
  7. Add comments to the code: This could involve adding comments to the functions in the code, based on the function names or other information extracted from the GPT-4 response.
  8. Collect and store metrics: This would involve running benchmarks on the code, collecting metrics on timing and usage of functions, and storing these metrics in a way that they can be read in the future.
  9. Error handling and retries: Implement logic to retry certain operations (like compiling or testing the code) if they fail.
  10. Create suggestions based on metrics: This is a broad task that might involve analyzing the stored metrics and generating suggestions for improving the code.

The Evolve Application

Working from this structure as a base, I started to build out the application itself. As I have built several AI linked applications, which included conversation history, various system prompts and an OpenAI library, I had a good start.

Broadly, the program is comprised of the following functionality:

  • Environment variables loaded from a .env file.
  • The Project Name and Description (above) stored in an instructions.txt file.
  • Database functionality (store and retrieve messages, store and retrieve tasks, etc).
  • Declarations (structs) and utility functions.
  • OpenAI library.
  • The evolve functionality itself.

The flow of the final program will be as follows:

  • All environment variables are loaded and internal variables initialised.
  • instructions.txt is parsed and sent to the evolve start function.
  • A new git repository is initialised.
  • The instructions are sent to ChatGPT and a task list is returned.
  • The task list is worked through in order, with status messages for each internal task.
  • For each task, code is requested from ChatGPT, written to disk, and compilation tested. Code includes detailed comments.
  • If the code cannot compile, the entire file gets sent back to ChatGPT to check for errors. This repeats for a given number of tries before failing.
  • Once code is compiled, tests are written for the code. Any errors are sent back for fixing until all tests pass.
  • Code is committed to the repository with a relevant message as to what was done.
  • Metrics are added to the code to be persisted, in a database or other storage. These metrics include function timings, function usage and number of errors.
  • Metrics are read and improvements suggested.

In order to achieve this, we need to break this up into several different pieces of work. I decided to start with parsing the instructions.txt file and getting a list of tasks back from ChatGPT.

Creating A Task List And Parsing English Input

I made heavy use of ChatGPT during the coding process, so I will include the prompts as well as the response from the API.

Prompt:

For now, the instructions will be given in a text file named "instructions" in the following format:
Project: [project name]
Description: [description]
I need to read from this file and return any associated errors.

Response (code only):

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
	"strings"
)

type Project struct {
	Name        string
	Description string
}

func main() {
	file, err := os.Open("instructions")
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()

	project := &Project{}

	scanner := bufio.NewScanner(file)
	for scanner.Scan() {
		line := scanner.Text()
		parts := strings.SplitN(line, ": ", 2)
		if len(parts) != 2 {
			log.Fatal("Invalid format in the instructions file")
		}

		switch parts[0] {
		case "Project":
			project.Name = parts[1]
		case "Description":
			project.Description = parts[1]
		default:
			log.Fatal("Invalid field in the instructions file")
		}
	}

	if err := scanner.Err(); err != nil {
		log.Fatal(err)
	}

	fmt.Printf("Project Name: %s\nDescription: %s\n", project.Name, project.Description)
}

I amended the function above to store the details in a struct which is then passed around to several other functions.

The first step is to act as a Product Manager, so I used the following system prompt:

You are an expert project manager. You will respond with a series of tasks that need to be completed to finish the project. Focus only on technical tasks, these should be complete enough to be Jira tasks and done in isolation.

Once I have read and stored the project details, I send the details to the ChatGPT API with the system prompt, the json for this is as follows:

{"messages":[{"role":"system","content":"You are an expert project manager. You will respond with a series of tasks that need to be completed to finish the project. Focus only on technical tasks, these should be complete enough to be Jira tasks and done in isolation."},{"role":"user","content":"Create a RESTful API using Go for a simple todo list. The API should support basic CRUD (Create, Read, Update, Delete) operations for todo items. Each todo item should have an ID, a description, a status (e.g., 'pending', 'completed'), and timestamps for when it was created and last updated."}],"model":"gpt-4"}

You might notice that messages is an array. As the project continues through the steps, the historical messages will be added to the prompt in order to maintain context for the given problem. The generated code will likely be excluded from the historical messages as it doesn't add much to the context and at some stage the early messages will have to be truncated. There should be enough context for the program to continue functioning, or we can switch to a larger context window (GPT4 32k token model) at a higher cost.

The response is as follows:

1. Set up Go development environment
  - Install Go and configure GOPATH
  - Install an IDE like Visual Studio Code with Go extension for code navigation and formatting

2. Initialize Go project and dependency management
  - Create project folder structure
  - Use `go mod init` to initialize project
  - Use `go.mod` file for package versioning and dependency management

3. Model todo item
  - Create `models` package
  - Write `todo.go` file containing a Todo struct with ID, Description, Status, CreatedAt, and UpdatedAt fields

4. Create a data access layer
  - Create `data` package with todo access interface definition
  - Implement in-memory database for Todo items
  - Implement CRUD methods: Create(), Read(), Update(), and Delete()

5. Implement API handlers
  - Create `handlers` package
  - Write API handler functions to perform CRUD operations, using the data access layer
  - Add error handling and response with appropriate status codes

6. Set up router and middleware
  - Create `routers` package
  - Import an external library like Gorilla Mux for routing and middleware support
  - Define routes and associate handler functions with HTTP methods (GET, POST, PUT, DELETE)
  - Set up middlewares for JSON content type and CORS support

7. Compile and run API server
  - Create `main.go` file in the root of the project
  - Import and initialize router and middleware
  - Run the API server using `http.ListenAndServe` function with a chosen port

8. Write unit tests
  - Create `_test.go` files for each package (e.g., `handlers_test.go`, `data_test.go`, etc.)
  - Use Go testing framework and functions, such as `testing.T`, to write unit tests for CRUD operations and possible error scenarios
  - Run tests using `go test` command

9. Document API
  - Write documentation for API usage (e.g., OpenAPI specification or simple README file)
  - Include request methods, paths, required data, and expected responses

10. Optional: Implement persistent storage
  - Choose a suitable database, such as SQLite, MySQL, or PostgreSQL
  - Create a new implementation of the data access interface with the chosen database
  - Update dependency injection in the handlers package to use the new implementation

As you can see, some of the tasks we do not need to do (like the first task), so we need to find a way to exclude this from the response. Other tasks will need to be broken down further. Overall though, this is a great starting point.

Next we read the tasks from the above list intro a struct in order to work with it in code.

Prompt:

Write a function to parse the following string into a go struct

1. Set up Go development environment
  - Install Go and configure GOPATH
  - Install an IDE like Visual Studio Code with Go extension for code navigation and formatting

2. Initialize Go project and dependency management
  - Create project folder structure
  - Use `go mod init` to initialize project
  - Use `go.mod` file for package versioning and dependency management

3. Model todo item
  - Create `models` package
  - Write `todo.go` file containing a Todo struct with ID, Description, Status, CreatedAt, and UpdatedAt fields

4. Create a data access layer
  - Create `data` package with todo access interface definition
  - Implement in-memory database for Todo items
  - Implement CRUD methods: Create(), Read(), Update(), and Delete()

Response (code only):

package main

import (
	"strings"
)

type Instruction struct {
	Step        string
	Description string
}

func ParseInstructions(instructions string) []Instruction {
	lines := strings.Split(instructions, "\n")

	instructionList := []Instruction{}

	for _, line := range lines {
		if strings.HasPrefix(line, "- ") {
			instructionList = append(instructionList, Instruction{
				Step:        line,
				Description: line,
			})
		}
	}

	return instructionList
}

func main() {
	instructions := `1. Set up Go development environment
  - Install Go and configure GOPATH
  - Install an IDE like Visual Studio Code with Go extension for code navigation and formatting

2. Initialize Go project and dependency management
  - Create project folder structure
  - Use go mod init to initialize project
  - Use go.mod file for package versioning and dependency management

3. Model todo item
  - Create models package
  - Write todo.go file containing a Todo struct with ID, Description, Status, CreatedAt, and UpdatedAt fields

4. Create a data access layer
  - Create data package with todo access interface definition
  - Implement in-memory database for Todo items
  - Implement CRUD methods: Create(), Read(), Update(), and Delete()`

	parsedInstructions := ParseInstructions(instructions)

	for _, instruction := range parsedInstructions {
		println(instruction.Step)
	}
}

The above function can now be used to parse the response from the API, turn them into an array of tasks and work through them turn by turn. These tasks will be saved in a database with a status field, indicating success, failure, or human intervention required.

There will be several types of actions evolve needs to take: write code to disk and run a command (e.g. go get, git init). We need to be able to differentiate from these somehow, perhaps by asking ChatGPT to prefix the response with CODE or COMMAND.

Challenges And Future Plans

Working with new technology is always challenging as you are exploring new territory. Through this initial exercise, it became clear how much you need to properly create your prompts in order to get the exact output that you need. This cannot be overstated: prompts are the new programming language.

I anticipate several issues, the main ones being around the many response cases not currently covered. What if the API returns text and code, instead of just code? What if the output does not conform to the idea as intended? Will the context window issue become real much sooner than I think?

Then there is the upside. What if this works, and the program gets created as intended while being completely automated? What if, when applied to existing code, it not only refactors it but continually improves it? The challenges I am facing now will likely not be there in a few month's time, what new functionality could come about that would make this far easier?

Conclusion

This will be a journey of discovery and finding the limits of the current geenration of generative AI. Only by deeply understanding the tools can you understand their applications, and I learn best by doing.

If you're interested in following along with me, and by getting insight into innovation generally, subscribe to get posts in your inbox.