Creating a Voice Assistant using ChatGPT

As soon as ChatGPT3 came out I knew we were on to something incredible. Initially the use case was playful: write me a story, turn it into a poem, list fun things to do with families, give me recipes, etc. When ChatGPT4 came out, this changed the game.

I decided to see how good the coding aspect had become. I have a software engineering background, have been using Github Copilot for a a long while and found massive increases in productivity. When I started digging into what GPT4 was capable of I was incredibly surprised.

With this new capability I decided to see if I could build an app in SwiftUI, which I do not know, and have it be a functioning app that worked well and looked good (or decent enough).

App Idea

While the form factor of ChatGPT was great and novel, I thought it would be cool to be able to have a conversation verbally, as if you were talking to someone in the same room or on a phone. The initial idea was for language learning, but epanded quickly to learning about history, asking questions on fitness and finding new ideas for businesses on topics I am interested in.

I felt this would not only be useful for me, but also for people who would enjoy this form factor like the older generation. I could picture my granddad speaking into the app for hours and asking all sorts of questions.

Process

While I am familiar with Go, I don't know much about Swift or SwiftUI at all. This used to be a blocker for development but I found ChatGPT4 to be a fantastic coding companion. Every single line of code for the mobile app was written by ChatGPT and about 80% of the server.

Coding with ChatGPT4 is completely iterative. This back and forth allows the code to evolve and ChatGPT manages to maintain a really good understanding of previous functions and the overall goal of the program.

Prompts really matter. I found I needed to be very clear about what the intention of the application was, as well as the intention of a given function. This is analagous to writing or talking through a given subject - clarity is forced as you work through what you are trying to say precisely. For example:

I want to create a function that takes text as an input and streams the response.

Should be written as:

I want to create a function that takes a string of text as an input, runs another function called "getStream". getStream returns audio data which should be streamed in real-time to the caller of the API.

This helps ChatGPT4 to understand exactly what you want to achieve and how, and even some of the why.

As of the time of writing, GPT4 has been out for two weeks. This is very new technology and it's remarkable it is as capable as it is now. With that said, there are bugs that happen. I've found the bugs for Swift were way more than for Go, maybe due to Go changing less and having backwards compatibility backed in. However this is an easy limitation to overcome.

You use ChatGPT as a pair programmer. As soon as bugs come in, I fed them into ChatGPT and it responded with a fix. Sometimes this fixed the issue, sometimes it didn't. If this continued for too long I would ask to try something different. Almost every single time this cycle resulted in working code. It still blows my mind I wrote an entire app in a language I don't know how to code in.

As this is an iterative process the code is by it's nature not well structured. I split some of the classes into their own files, and when a function was unruly I asked ChatGPT to refactor into smaller functions.

While I understood ChatGPT would be good at code, I wasn't so sure on the design of the app. I started by asking what a good colour scheme would be, then chose one and asked it to create both light and dark modes for the given theme. I then gave the view code to ChatGPT and asked it to improve the design, which it did by adding drop shadows, rounded corners and other niceties.

I then asked "what would you do to improve the look and feel of the app?" and got a list of improvements. The list included various animations (loading, talking, playback), respositioning of elements, different status displays and error handling. I then asked for the code for each of these and implemented. Asking ChatGPT for improvements on the app felt like a gamechanger, it was really like having a second pair of eyes on a project and getting something close to real, legitimate feedback.

I did run into some limitations of ChatGPT which mainly stems from not being able to send large amounts of the codebase. When you have several functions interacting it becomes difficult tfor ChatGPT to understand how they connect, unless you paste in lots of code. One example of this was trying to get real0time streaming to work, where I eventually got it to work using WebSockets but ChatGPT did not recommend this as the initial focus was on a REST API.

I will write a much more detailed post on coding with ChatGPT4 and the various processes and tricks to get a good outcome.

Outcome

I'm super happy with the app as it stands. You can join the TestFlight here, and as soon as I have tested real-time streaming the app will be published on the store.

Need help with your business?

Enjoyed this post? I offer consulting services to help businesses like yours tackle AI, tech strategy, and more. Learn more about how I can support you.