Investigating the Potential of AI in Software Development Part 3: Pathfinding 

Contributing experts

This is the third part of a blog post series detailing HTEC’s first-hand investigation into the feasibility of delegating the development of full-fledged software solutions to AI tools.  

You can read the first two installments on the following links:  

Introduction 

Requirements and System Design 

The series spans the experiences of one of HTEC’s most experienced Solution Architects, Zoran Vukoszavlyev, in relying exclusively on AI tools through all stages of a simulated software development project. This blog post focuses on the challenges of AI-assisted service development and the evolution of Zoran’s prompting strategy.    


From the very start, the purpose of this investigation was not necessarily to successfully complete the simulation project, but to put AI models to the test and see if we can identify approaches and methods that lead to the most positive outcomes.  

Having moved past the requirements and system design phases, the initial attempts at developing the services package were the least fruitful in terms of concrete output, yet most informative for the development of an effective prompting strategy that would result in generating backend services, API contracts, and the software itself up to acceptable standards.  

While the next installment of the series will detail the solution itself, we dedicate this chapter to the winding path that led us to it and the valuable insights gathered through the journey.  

Understanding the model 

As we detailed in the previous blog post, AI suggested a hybrid architecture of microservices with event-driven components. To optimize time and resources, Zoran decided to first focus on a single resource (one of the suggested services) and use it to cover the entirety of development – including API contracts, service development, UI implementation, and high-level testing. 

In this phase, Zoran was still undecided about the choice of technologies for the project, but chose to focus on Java out of familiarity. Due to his extensive experience with Micronaut and specialized microservice frameworks, it was his framework of choice over a more commonly used Java-based framework such as Spring Boot. This choice would prove to be a major challenge for AI models.  

The initial attempts at code generation required six restarts. Each iteration ran into unresolvable illogical issues that made no sense to pursue further or try to fix. According to Zoran, with each iteration, it became clear early that AI was taking things in the wrong direction, and he was quick to stop making manual fixes and start anew.   

One of the more commonly occurring issues was that AI was mixing dependencies, best practices, and implementation details between Micronaut and Spring Boot. The issues were so extensive that it took significant effort just to make it compile the build successfully, let alone get code of even passable quality. 

Working with Micronaut reaffirmed the already growing assumption that AI models struggle more with less commonly used technologies. With Java, even though there are many different frameworks for microservices, most people go with Spring Boot.  Consequentially, most of the publicly available implementations are based on Spring Boot as well, and that’s the information that was fed to AI models for training. The second misleading factor could be that Spring Boot and Micronaut look very similar, and it is somewhat difficult to tell the differences at first look, which might also increase the chance of hallucinations.  

Changing the course 

After multiple unsuccessful iterations, Zoran concluded that he needed to modify his approach to the problem. Instead of continuing to give different instructions to the AI model, he decided to create a skeleton project focusing on a single example resource and to implement it manually from scratch. The skeleton project would serve as an example prompt for the AI model – kind of a programming version of the “show, don’t tell” principle. 

Zoran built the dependency and package files, adding all the components and defining the application layers manually: what the control and service layers should look like, how do we want to reach the event broker, how to manage the data in the database, etc. In short, he created an imaginary example resource with multiple types of fields, ID generation, and anything else he thought might be useful. He then instructed the AI model to implement the service by using this imaginary resource as an example.  

This yielded much better results than the previous iterations. For the first time, the implementation was successful, and AI was able to reuse the majority (80-90%) of the sample code. According to Zoran’s estimation, the AI model had generated about 95% of the source code with the example resource as a part of the prompt. The end result: successfully defined API contracts for a single service, laying the groundwork for  
modular development. 

The creation of the complete example resource manually took about a week. Zoran did delegate the more pedestrian parts of the skeleton project to the AI model (he estimates a 58-42 percentage ratio between manual and AI-generated code), but he was firmly in the driver’s seat with the decision-making and the implementation.  

Zoran considers the time spent building the skeleton project a worthy investment in the long run: 

“It is much harder to explain everything we want and give clear instructions via a prompt than to provide an example and say, ‘I want it done exactly like this’. Some may question the value and the purpose of coding manually for a week just to provide an example for the AI model, but I believe it is smart context management. From that point on, I was able to achieve far superior results with less iterations and tokens.” 

Key takeaways 

Sidenote: Switch from Claude AI to Amazon Q Developer 

During this stage of the experiment, Zoran made the switch from Claude AI to Amazon Q Developer because of the differences in context management between the two AI assistants. The context size is practically the same between the two, but the way they manage it is quite different. With Claude AI, once we reach the daily/session limit, it simply stops generating code, even if it is in the middle of a task. This can be very problematic, because not only do we need to wait for several hours to continue, but then the context is missing and we need to rebuild it. Of course, one solution is to get a more expensive data plan with a bigger context, but we wanted to see whether Amazon Q Developer works any differently. It turns out it does.  

Amazon Q silently compresses the context once it gets closer to reaching the token limit. This is a bit deceptive because it gives the impression of an infinite context window, but at some point, we see a significant drop in quality in the generated code. To get better results, we need to partially rebuild the context. The latest version of Amazon Q now sends a notification that it will begin to compress the context.

While not ideal, Zoran did find this type of context management more useful than simply stopping midway through and leaving a Java class unfinished. This distinction is important to note because, otherwise, both Claude and Amazon Q generated code of similar quality, but the difference in context management might be important to others. 

Stay tuned for the next installment of the series, detailing the AI-generated solution.   

Explore more

Most popular articles