With no end in sight to the memory crunch, AMD thinks that AI, the main cause of the shortage, could be part of the solution. This week, the House of Zen acquired predictive memory startup Mext for an undisclosed sum, setting the stage for a world where bots decide which data to put into RAM and which to store in less-expensive flash. Founded in 2023, the Mext proactive memory platform uses machine learning algorithms and learned heuristics to proactively offload "cold" memory to flash storage, and, based on data access patterns, restore it before its needed again. Modern flash arrays are already approaching main memory in terms of aggregate bandwidth, but swapping to disk still imposes a stiff latency penalty. Mext claims it can expand the effective memory of a system by 2 to 4x using flash, which gig for gig is still vastly less expensive than DRAM. This flash memory is exposed to the operating system like regular memory simply by running the Mextd daemon. Memory tiering is nothing new and has seen various reincarnations over the years with some being software based and others, like Intel Optane persistent memory, using special 3D XPoint memory tech co-developed by Micron. Mext stands out for its use of machine learning to migrate data from hot memory to cold storage almost like a branch predictor — something AMD has an awful lot of experience with. Mext isn't using one model to decide when to shuffle your data. Instead it uses a series of heuristics, long short term memory, and modern transformer architectures depending on which combination renders the best results. “This approach has the potential to reduce infrastructure costs, improve resource utilization, and help customers more effectively scale general-purpose and AI workloads,” Dan McNamara SVP of AMD’s compute and enterprise AI biz wrote in a blog post this week. Beyond enterprise applications, the technology could have implications for AI serving. Modern mixture of experts (MoE) models are, as their name suggests, comprised of multiple sub-models. For each token predicted, a different selection of experts may be used. In practice an LLM may use some experts more frequently and others rarely. We wouldn't be surprised to see AMD use Mext's prediction algorithms to offload infrequently utilized experts from HBM to slower system memory, enabling enterprises to take advantage of larger more capable models with fewer resources. That’s just speculation of course, but we've reached out to AMD for comment; we'll let you know if we hear anything back. ®
OPINION Do AI agents need a new kind of CPU? That's what Arm, Nvidia, and a growing number of chip designers would have you believe. Arm named its first datacenter silicon the "AGI CPU." Nvidia CEO Jensen Huang described Vera as a "CPU for agents," and AWS's Graviton 5 marketing is chock full of references to agentic AI. None of these Arm-based processors are going to bring about the singularity. They're not even AI accelerators. Don't let the spin doctors fool you – these chips are nothing more than general-purpose processors that have received an AI glow-up. Sure, AI agents and their harnesses need CPUs. No argument there. But agents aren't one workload. They're simply a bridge between the AI model and the same applications we've been running for decades. And the tools those agents end up running often look wildly different. Some will benefit from a higher ratio of memory bandwidth to compute, some will perform better on chips with large unified caches or dedicated compression engines, while others will prefer high frequency over core count, or vice versa. There's a reason AMD and Intel don't just build one Epyc or Xeon SKU, and why all of the "purpose-built" agentic CPUs look so different. If you look at what Nvidia has built with its 88-core Vera CPU, the chip promises high single-threaded performance with gobs of memory and interconnect bandwidth. As Huang explained it during his GTC Taiwan keynote, this combination of compute and bandwidth is key to keeping latency as low as possible. "There will be billions of agents and these agents are going to be using the CPUs with very little patience because the cost of the GPUs they sit next to is too high," he said. But of course Huang would say that – he's in the GPU-slinging biz. Vera, just like Grace, was designed to keep data flowing between the CPU and GPU as smoothly as possible. Data movement is literally Vera's thing. Arm's AGI CPU, meanwhile, looks to be a bog-standard Neoverse V3 processor with 136 cores that's been stripped of anything an agent is unlikely to need in order to keep power consumption as low as possible. No simultaneous multithreading or dedicated accelerators, minimal vector extensions, but loads of memory bandwidth. Amazon's 192-core Graviton 5 processors, announced at Re:Invent last winter, are essentially a scaled-up version of Arm's AGI CPU, right down to the Neoverse V3 cores, but arguably even more generic. To echo Corey Quinn, "please, for the love of all that's holy, stop calling them 'AI chips.'" Not to be left out of the fun, Intel and AMD have also been keen to recast their flagship Xeons and Epycs as the ideal platforms for running AI agents. At Computex earlier this month, Intel showed off a couple of reference rack designs packing as many as 36,864 x86 cores into a 100 kW rack. Meanwhile, AMD, following an initial round of Vera CPU benchmarks, went on the defensive last week, arguing that concurrency, not latency, is the metric that matters most when running agents at scale. The House of Zen projects that for a 100 kW power envelope, its 256-core Venice Epycs, due out later this year, would deliver 3.3x higher throughput per rack than Vera. If it feels like everyone has a different opinion on what the ideal agentic CPU should look like, that's because, as with any other datacenter workload, there's rarely one right answer. We see this in early benchmarks of Nvidia's Vera CPU. Late last month, FOSS-friendly publication Phoronix got early access to the chip and ran a subset of its test suite that Nvidia apparently felt was representative of its target market. The chip achieved a geo-mean score 10 percent higher than AMD's 128-core Epyc 9575F, and 55 percent higher than Intel's 128-core Xeon 6980P. That's a strong showing. But looking closer at the results, it becomes clear that Vera performs better in some apps than others. And this gets to the crux of it all. There has never been one CPU to rule them all, and as the AI hype cycle enters its agentic era, there certainly isn't one now. ®
Microsoft is facing AI-related issues on multiple fronts. Disgruntled investors have flung a sueball at the company over its Copilot claims, while it is reportedly turning to other cloud vendors to help with AI-induced scalability issues at its coding collaboration tentacle, GitHub. The sueball is a class action, filed by the City of St. Clair Shores Police and Fire Retirement System in the Seattle US District Court, that alleges that Microsoft bosses (including its CEO, Satya Nadella) made "materially false and/or misleading" statements about adoption of the company's Copilot technology. On the contrary, according to the complaint, "Microsoft’s flagship proprietary AI model ranked well below competitors on a number of benchmark tests," and "Microsoft had failed to convert a significant percentage of its commercial Microsoft 365 users to paid Copilot subscriptions and the Company's Copilot offerings had lost market share to rival products, a trend that was increasing." Some organizations are gung-ho for Copilot these days – NHS England, for example, announced plans last week to roll the technology out to more than half a million staff. However the class action alleges Microsoft's SEC filings did not clearly explain problems "regarding the development and customer adoption of Copilot products and Microsoft's proprietary AI models." On January 28, Microsoft announced results for its fiscal second quarter, which included a slowdown in Azure growth and an admission that paid Microsoft 365 seats had reached only 15 million out of 450 million Microsoft 365 users. The company's shares subsequently declined by more than $48 per share, around ten percent of their value at the time, according to the complaint. “We are aware of the complaint and believe the claims are without merit. Microsoft stands by the integrity of its public statements and will vigorously defend itself in court," a Microsoft spokesperson told The Register. Git thee to AWS? Microsoft's AI headaches are not limited to the sueball, which the company reportedly claims "is without merit." Its source-shack tentacle, GitHub, is also reportedly facing the possibility of being forced to leap into bed with a rival to address ongoing reliability and scalability woes. Microsoft acquired GitHub in 2018, but the source site has sometimes struggled with availability amid a surge in AI-assisted workflows. The site has attempted to shift workloads to Azure, but has, for many users, remained unreliable. Azure has, infamously, had its own capacity problems recently. According to reports, the source shack will be propped up with additional resources from AWS, although it is not clear whether this is a temporary measure to address immediate problems or something more permanent. After all, given the choice, few IT managers would entrust all their workloads to a single vendor, and a multicloud approach is sensible. "The context here is important: Our community is growing at a rate we've never seen before, and the incredible spike in agentic development that began late last year has tested our infrastructure's limits," a GitHub spokesperson told The Register. "To meet this demand, we are both accelerating our move to Azure and continuing to explore a multi-cloud strategy to ensure we have the future capacity, compute elasticity, and horizontal scale required to support continued growth." It is, however, a little embarrassing when your owner operates its own cloud service. ® Updated at 1631 with comment from GitHub.
Servers employing x86 chips from AMD and Intel now account for little more than half of server revenue, according to the latest figures from IDC. In its Worldwide Quarterly Server Tracker for Q1 2026, the analyst firm says that non-x86 server revenue hit $58.7 billion, representing a startling increase of 107 percent over the same period last year. The results mean that those non-x86 servers make up 47.9 percent of the market revenue, closing in rapidly on the amount of cash spent on x86 boxes. The growth in non-x86 turnover is likely thanks to systems powered by Nvidia’s AI chips featuring Arm cores. Although there is high demand for these, they also cost a pretty packet compared to an average datacenter box. In fact, IDC noted a stark divide shaping the worldwide server market, which reached $122.6 billion in vendor revenue during this period, a 30.4 percent increase year-on-year. On the one hand, AI infrastructure investment from hyperscalers and large cloud providers is “running at a scale that shows no sign of plateauing,” while everything else - the non-accelerated segment - faces a supply-constrained environment, thanks largely to that AI infrastructure spending. As Reg readers will know, memory chipmakers are prioritizing manufacturing capacity for higher margin products for AI servers and GPUs, starving the rest of the market of supply. Component availability, particularly DRAM and NAND flash, is limiting near-term shipment volumes from vendors, IDC says, though order pipelines are strong. Supply of the right chips is therefore the chief limiting factor on server market growth. Revenue for x86 servers still reached $63.9 billion, but this was a decline of 2.9 percent due to those component supply constraints impacting shipment volumes. GPU accelerated servers pulled in $68.9 billion for the vendors, up nearly 25 percent year-on-year, while other accelerated servers surged a massive 122 percent to $17.7 billion. The latter category represents AI systems configured with FPGAs or ASICs rather than GPUs. IDC’s spin on the data is that AI infrastructure adoption is no longer limited to hyperscalers, thanks to developments such as government-led sovereign AI initiatives, while the non-accelerated segment tells a more nuanced story. Although revenue here declined, underlying demand remains strong, but many enterprise customers are holding out against elevated component prices. “Companies aren’t pulling back from infrastructure investment; they’re just not getting servers as fast as they need them. Longer term, emerging workloads, including agentic applications and physical AI ecosystems, will keep demand elevated well beyond the current cycle,” commented IDC research director Juan Seminara. The firm says it expects to see supply normalization beginning in 2027, with capacity relief coming as chipmakers bring new fabrication plants online. Across the last two decades, non-x86 servers accounted for less than ten percent of revenue, and most of that went to IBM which emerged as the last vendor of proprietary servers as Oracle lost interest in Sun and the likes of HPE decided they couldn't sustain businesses built on exotic architectures. ®
Qualcomm is reportedly moving to buy AI chip firm Tenstorrent, an acquisition that could prove a major boost to the RISC-V ecosystem. This comes from The Information, which cites an anonymous source claiming that a deal valued at $8 billion to $10 billion is under discussion. According to the report, the talks are ongoing and there is no certainty a deal will be reached, but the move would fit with Qualcomm's datacenter ambitions and bullish statements about AI opportunities made by its chief, Cristiano Amon. The Register asked Qualcomm and Tenstorrent to comment. Tenstorrent is a Canadian AI chip startup that bases its products on the permissively licensed RISC-V processor architecture. The company is led by CPU guru Jim Keller, known for his design work at AMD, Apple, and on DEC's Alpha chips back in the day. The firm's Galaxy Blackhole AI compute platform went on sale earlier this year, packing 32 of its Blackhole accelerators, each with 768 RISC-V cores, into a 6U enclosure running its own software stack. Qualcomm is also keen on RISC-V, especially since its licensing court battle with chip designer Arm, which wanted to nix Qualy's license to create its own Arm-based processor silicon. The chip design firm's datacenter products use home-brew Hexagon neural processing units, but it continues to rely on Arm processors in its Snapdragon range. In December, Qualcomm picked up Ventana Micro Systems, another company designing RISC-V CPUs targeting datacenter and enterprise applications. Financial details of that were not disclosed, but estimated at between $200 million and $600 million. A Tenstorrent buy could therefore see a greater commitment to RISC-V from Qualcomm, giving the open standard a shot in the arm (pun intended) and allowing the chipmaker to further distance itself from Arm and its owner SoftBank as it pursues datacenter customers. Arm appears unfazed by that prospect, having recently said it expects datacenter chips will soon be its main source of revenue. ®
Iran targeted Bahrain, Qatar, the United Arab Emirates and other American allies in the Middle East during the war, harming their economies and military sectors.
Amid the unrelenting demand for AI infrastructure, SK Hynix, the world’s largest supplier of HBM memory used in high-end GPUs, now expects to triple its wafer capacity. You'll just have to wait through two more US presidential elections and then some. All that capacity won’t come online until 2034, SK Group Chairman Chey Tae-won told Nikkei Asia in a recent interview. SK Hynix’s valuation has soared in recent months. The company is one of three major producers of NAND flash and DRAM memory, large quantities of which are required to support the burgeoning AI inference market. Samsung and Micron are the other two major players in this space. This demand has led to skyrocketing memory prices for consumer DRAM and SSDs, some of which have more than tripled in price compared to this time last year. SK Hynix and the other major memory makers meanwhile have seen their revenues explode. Chey's comments come just a week after SK Hynix said that it planned to double its production capacity within the next five years. “Our calculations show that our wafer capacity will double within five years. But honestly once all these facilities are built, it won’t just double, it will triple by around 2034,” Chey told Nikkei. SK is in the process of bringing four additional wafer fabs online, with the first phase reportedly on track to come online as early as 2027. The South Korean memory slinger had previously planned to ramp production of these facilities over the next two decades, but has pulled in its timeline in hopes of satiating AI’s memory addiction. “There is currently no way to move faster than this,” Chey told the newswire. While much of this capacity will be built on SK’s home turf, the company is exploring its options for overseas manufacturing, with Japan being one of the potential destinations, with Chey calling it an “excellent” candidate due to its robust semiconductor supply chains. Unfortunately, the buildout is unlikely to drive down memory prices for consumers any time soon. As we previously reported, memory prices are not expected to peak until later this year at the earliest. Analysts warn that memory prices are more likely to plateau going into 2027 rather than plummeting like we’ve seen in past DRAM and NAND boom-bust cycles. These boom-bust cycles have been a fact of life for commodity electronics manufacturers, like SK Hynix and Samsung, for years. Prices typically spike as inventories are drawn down and crater as new capacity is brought online. On the one hand, AI infrastructure demand has helped to stabilize this to some extent. On the other hand, the AI boom kicked off in 2022 at what was arguably the worst possible time. "This demand started in the Valley for the DRAM industry. That makes financially trying to build additional capacity really challenging," TechInsights analyst James Sanders told El Reg late last year. Business is once again booming for memory vendors presenting ample opportunities for labor disputes over competition as well as fab expansions. Unfortunately, there’s no changing the fact that the fastest anyone can bring a leading edge memory fab online is about three years. ®
COMPUTEX 2026 It’s hard enough for startups to compete with AMD and Nvidia on chip design. The rise of rack-scale architectures has only made things harder. Companies not only have to invest in chip design but also the mechanical, thermal, and power engineering necessary to pack six dozen or more AI accelerators into a single rack that functions as one enormous GPU. At Computex last week, Delos Data, a startup funded by former Intel and Barefoot Networks execs, showed off a modular server platform aimed at giving chip startups a shortcut to rack scale. One of the challenges with the move to rack scale is actually the sheer amount of networking that needs to be enabled at the box. A typical eight GPU HGX node only needs one or two ports per GPU. By comparison, a GB300 NVL72 needs 18 400 Gbps ports per GPU. Nvidia and AMD have developed custom racks with integrated backplanes, power delivery, and cooling. Delos by comparison is keeping things relatively simple by designing a chassis that, at least from the front, looks more like a switch than a GPU server. It features 36 OSFP ports, nine for each of the four OAM sockets at the heart of the system. OAM, if you’re not familiar, is an open socket commonly used by high-performance accelerators requiring more interconnect bandwidth and power delivery than standard PCIe cards can manage. Assuming 200 Gbps SerDes, that works out to 3.6 TB/s per chip of interconnect, the same as Nvidia's new Rubin GPUs. OSFP means that customers can use standard DACs or pluggable transceivers, and switches depending on how large they want their scale-up domain to be. And while OSFP is usually associated with Ethernet, you can run just about anything you want through them, whether it be UALink, Ultra Ethernet, PCIe, or something else. From a deployment standpoint, these systems would be wired up like any other hyperscale system, just a whole lot denser. Delos isn’t the only option out there for chip startups looking for scale up reference design. AWS for example appears to be repurposing Nvidia’s MGX form factor for its Trainium 3 rack systems, while AMD’s Helios rack is now an OCP standard. Both designs would, in theory, be easier to service, but Delos argues that its modular design offers greater flexibility. “It makes it a little bit more flexible in terms of, maybe you want a scale up domain of 100 or maybe you want it a scale up domain of one,” CTO Dan Daly told El Reg. “It just depends on how many cables you want to plug in. This also allows you to go plug into different types of switches… it could be simpler switches, maybe even optical circuit switches (OCS).” Using existing packet switches from Broadcom or Marvell, such a design could support 512-1,024 accelerators in a single layer fabric depending on whether you're using 200 Gbps or 100 Gbps SerDes. Using multi-layer fabrics, OCS, and/or 2D/3D toruses, the compute domain could scale even further, all while using off-the-shelf components. While OSFP keeps things simple and easy, it also means power consumption could become problematic for larger compute domains requiring pluggable optics. In fact, this is why Nvidia has taken so long to embrace optical scale-up. Copper may not have the reach, but it uses a fraction of the power. Delos CEO Ed Doe tells us the company is already exploring versions of the system that will use near package or co-packaged optics out to MPO-style connectors rather than the OSFP. The startup isn't just doing hardware. As anyone who's done large scale networking knows, the physical and logical topologies — that is, the way devices communicate with one another on the network — can look very different depending on the workload. Delos has developed a software orchestration platform designed to facilitate the configuration and monitoring of these switched fabrics or meshes in order to enable dynamic rerouting of traffic in the event of a link failure. At Computex, this software platform, which Delos has dubbed its Nonstop AI network, was on display, allowing attendees to pull links at random and see the network react and correct itself automatically. The company's ambitions don't stop at network orchestration and systems. We're told Delos has additional products in the works, and we don't know for sure what they are, but a high radix switch design built atop merchant silicon would certainly complement its Nonstop AI systems. ®
Dutch semiconductor startup Qualinx is claiming a breakthrough of sorts in European sovereign manufacturing thanks to an end-to-end semiconductor fabrication flow it is using for its new satnav chips. The firm, a spin-off from Delft University of Technology, says it has demonstrated that security-critical chips for aerospace, defense, and critical infrastructure can be designed, manufactured, and delivered entirely within Europe. Tape-out of the Qualinx QLX3xx, a family of ultra-low-power Global Navigation Satellite System (GNSS) systems-on-chip (SoCs), represents the first step on the path toward a fully automated trusted European manufacturing flow, the company claims. But Qualinx is a fabless design shop and relies on a contract manufacturer to make the chips for it. In this case, it is GlobalFoundries (GF), an international business with its headquarters in the US – so much for sovereign manufacturing. The pair say that GF's Dresden fab is establishing a European manufacturing flow with funding from the European Chips Act. This will ensure that every step of the production process occurs within the EU, so that no sensitive design data leaves the region. "This first secure product demonstrates that a fully European manufacturing path – from mask services to wafer production – is already a reality today," said Qualinx CEO Tom Trill. Qualinx is perhaps placing an emphasis on security-critical chips because there are already European semiconductor firms that design and manufacture their own products, such as STMicroelectronics. And Reg readers with long memories will recall that the UK once had its own processor company in the shape of Bristol-based Inmos, which made the Transputer, manufactured at Newport Wafer Fab (NWF) in South Wales – now sold off to US chip biz Vishay Intertechnology. The Qualinx chip will be made using GF's FDX fully depleted silicon-on-insulator manufacturing process, which we understand is a 12nm node. While advanced, this is some way behind cutting-edge processes such as Taiwanese chip giant TSMC's 2nm N2 process, now in mass production. But there has been debate about whether Europe really needs cutting-edge fabs. The European Commission's new Digital Sovereignty package proposes a Chips Act 2.0 that would fund a sovereign "AI chip factory." But as the Center for European Policy Analysis (CEPA) points out, European chip demand comes mostly from the automotive sector and industrial applications, which rely on 28/22nm technology, not cutting-edge silicon. "We are demonstrating that Europe can rely on a secure, end-to-end semiconductor manufacturing flow that meets the highest requirements of aerospace and defense," stated GF SVP and general manager Dr Manfred Horstmann. "Our partnership with Qualinx marks the first operational milestone." ®