The hardware boom is slowing down. What comes next is a software, power, and inference problem—and most of the industry isn’t ready for any of it.
AI chips are now 0.2% of all chips manufactured, but they account for roughly 50% of total industry revenue1. This ratio reflects current semiconductor trends. AI is already reshaping chip design cycles, manufacturing yield optimization, and deployment architectures. Yet according to HTEC’s global survey of 250 C-level semiconductor leaders, only 44% of organizations have fully embedded AI across multiple functions, while the remaining 56% are still operating in pilots, limited deployments, or early exploration. The enterprise-wide AI integration is not yet in place. Based on insights from the survey and expert commentary from Craig Melrose and Ian Baird, senior semiconductor leaders at HTEC, several trends are set to define the semiconductor landscape in 2026.

The AI hardware pyramid will collapse—here’s what replaces it
The AI semiconductor industry is structured like a pyramid balanced on its tip. On the wide end: hundreds of chip companies, billions in capital deployment, relentless hardware innovation. On the narrow end: a surprisingly thin layer of scaled, production-grade AI applications running on all that silicon. The 0.2% / 50% clearly signals where the industry’s focus will shift this year. As Craig pointed:
“Much of the current focus is on hardware rather than on solving real-world problems.“
What replaces the pyramid is a more balanced structure: hardware consolidated around a smaller number of dominant software ecosystems, connected to a validated layer of end-use applications across manufacturing, physical AI, and healthcare. The organizations crossing the chasm in 2026 will be those that enter the market with a software stack that works on real enterprise workloads, not just in a lab. Take NVIDIA as an example.
Why the next NVIDIA won’t be decided by silicon
NVIDIA did not win the AI chip race because it had the best silicon. It won because it had CUDA—good enough, early enough, and sticky enough to become the default substrate for AI development worldwide. Ian Baird is direct: the main barrier facing every custom accelerator today is not hardware performance, it is software compatibility.
NVIDIA’s advantage stemmed from GPUs built for parallel processing—an architecture that aligned exceptionally well with the demands of AI. Every competitor since has been fighting uphill against a software ecosystem that reinforces itself with every new project. Craig Melrose frames it through Geoffrey Moore’s adoption curve:
“Hardware is the innovator. The fast followers and the mainstream will be decided by software.”
For semiconductor companies building or adopting new accelerators, the decisive investment is in compiler toolchains, kernel libraries, and the engineering teams that can port workloads and close the software gap.
The survivors will share a few traits:
- deep software investment,
- focus on specific high-value workloads,
- and a validated deployment model.
An accelerator that is 30% faster at a workload nobody runs in production is not a business. An accelerator that is 20% more efficient at video inference at the edge, with a full software stack and proven deployment path, is.
By 2027, most AI inference will run at the edge—the software isn’t ready
The default assumption that AI inference runs in the cloud is under pressure from two directions: the growth of physical AI applications, and the accelerating energy cost of data center compute. Craig is unambiguous: “Physical AI will be majority edge.” Robotics, autonomous vehicles, and factory floor systems all require inference at the point of action. A humanoid robot cannot wait for a cloud round-trip before deciding how to grip an object.
“The hard problem of edge AI isn’t getting capable silicon to the edge. It’s getting the software to run correctly across all of it.”
But the bottleneck is not hardware. The NPU landscape is deeply fragmented: AMD, Intel, Qualcomm, Apple, and dozens of others have deployed neural processing units with incompatible architectures and tooling. Ian flags the problem directly: developing software that runs efficiently across this heterogeneous ecosystem is the hard problem, and existing frameworks only partially address it. The hardware for edge inference is arriving. The software ecosystem to match it is not. As Ian puts it:

HTEC’s global survey: The State of AI in the Semiconductor Industry in 2025-2026
2026 is the year chiplet architecture goes from niche to mainstream
The monolithic GPU as the default AI compute platform is ending. The diversity of AI workloads has made specialization economically attractive at scale. Chiplet architectures allow companies to mix and match compute, memory, and I/O components from different sources and process nodes, enabling customization that was impractical with monolithic die designs.
Craig points to companies like Modular approaching hardware from unconventional angles. They are building chiplets and modular hardware that are changing what “a chip” even means. Ian adds D-Matrix, HTEC’s own client, as an example: integrating ultra-low latency memory with compute, optimized specifically for inference workloads like video generation and prompt processing. Google’s TPUs, Microsoft’s Maia, Amazon’s Trainium—all represent the same underlying bet. The hyperscalers figured this out years ago. In 2026, the rest of the market catches up.
Inference efficiency will matter more than raw FLOPS by 2027
When power is the constraint, efficiency is the moat. The frontier of inference optimization is increasingly on the software side.
The buying conversation is already shifting from peak compute performance to FLOPS-per-watt, latency-per-query, and cost-per-inference. Software-level inference optimization—model distillation, quantization, compiler tuning, right-sizing models to actual workload requirements—is where the meaningful performance gains come from in 2026 and 2027. A well-optimized smaller model running on efficient silicon will outcompete an over-specified model on a power-hungry chip in almost every production scenario.
The companies that treat inference optimization as a first-class engineering discipline, not an afterthought, will have a structural cost advantage that’s very hard to close later.
Data centers will brown out before 2028
A critical risk in the AI infrastructure buildout is being underappreciated: energy supply is not scaling at the pace of AI compute demand. Gas turbines—the fastest path to new power—are already booked through 2028, and power availability is emerging as a hard constraint on data center expansion. Early local reports—such as a Nevada utility planning to prioritize data center demand over existing telecom infrastructure—highlight how these constraints may begin to surface in practice.
Craig raises the downstream questions directly:
“What happens to the systems that depend on data centers if those data centers face energy deficits? If a critical production system has a real-time dependency on cloud inference and that data center browns out, who is liable?”
This is not a theoretical concern but an operational one. Organizations that build in resilience now—through edge deployment and workload prioritization—will be better positioned as these constraints intensify.
Physical AI will grow faster than data AI in 2026—here’s the evidence
The next wave of AI chip demand will come from AI embedded in physical systems like robots, vehicles, factories, and consumer devices. Craig argues this is already underway: the use cases requiring local, real-time inference are multiplying faster than the data-center-centric use cases that dominated the last cycle.
The evidence is visible in adjacent markets. Consumer devices like smartwatches, portable ECGs, smart rings, as well as industrial automation and autonomous vehicles all require edge inference, power efficiency, domain-specific software stacks, and the ability to make AI work reliably in environments that are nothing like a controlled data center. They are also the applications that the current software tooling is least prepared to serve.
The companies that figure out how to make AI work in the physical world will be the ones who understood that hardware and software were never separate problems.
The bottom line
The hardware race in semiconductors is well underway. The outcome will be decided in software. The accelerators and platforms that survive the shakeout will be those connected to validated use cases, backed by robust software ecosystems, and designed for the energy and deployment realities of an edge-first, power-constrained world.
In this environment, aligning with the right technology partner is often key to success. HTEC works alongside semiconductor organizations to close the hardware–software gap, port workloads to new architectures, enable edge AI strategies, and build the talent needed to scale AI across design, manufacturing, and quality.
About the experts
Craig Melrose is a semiconductor and AI industry expert at HTEC, bridging hardware innovation and enterprise AI deployment. Ian Baird is a senior semiconductor expert at HTEC focused on workload porting, AI compilers, and heterogeneous compute environments. Insights are drawn from expert interviews conducted in March 2026 and HTEC’s commissioned global survey of 250 C-level semiconductor leaders, conducted by Censuswide.




