How AI helps World Cup referees make the call
Computer vision won鈥檛 replace referees at the World Cup. But it can help them make better calls when every inch matters.
More than 1.5 billion people worldwide are the 2026 World Cup finals. With that many fans scrutinizing every pass, touch, and goal, FIFA is leaning on advanced computer vision technology to help referees make faster, more accurate calls on the way to crowning this year鈥檚 victors.
This year, the tournament鈥檚 officiating toolkit includes , which supports video assistant referees (VAR), goal-line technology, advanced semi-automated offside technology, and a 鈥渓ast touch鈥 feature for corner and goal kicks.
鈥淚t鈥檚 a very sophisticated system that glues together multiple computer vision techniques,鈥 says , an associate professor of computer science at the 91原创 and an expert in computer vision. 鈥淵ou have calibrated cameras, real-time vision models to detect the ball, players, and their poses, as well as a decision layer to identify when some sort of intervention needs to happen.鈥
For players and fans alike, the result may be shorter waits for close calls.
FIFA first deployed Sony鈥檚 Hawk-Eye ball-tracking technology in 2012 at the Club World Cup. At the 2022 World Cup, FIFA , which combines limb- and ball-tracking data with artificial intelligence to provide referees and video match officials with information in mere seconds to inform offside decisions.
Before VAR, it was 鈥榯he hand of god鈥
In the 1986 World Cup quarterfinals, Argentina鈥檚 Diego Maradona scored one of soccer鈥檚 most infamous goals鈥攗sing his hand to punch the ball into the net. The referee never saw the infraction, and the goal stood. Maradona later described it as 鈥渁 little with the head of Maradona and a little with the hand of God.鈥
Today, a combination of high-speed cameras, computer vision, and video review would almost certainly flag the violation within seconds. It鈥檚 a reminder of how far officiating technology has come鈥攁nd why FIFA continues to invest in tools designed to help officials get the biggest calls right.
How does computer vision track players and the ball?
Player- and ball-tracking systems rely on dedicated computer vision neural networks trained on millions of annotated images and videos.
鈥淭raining a computer-vision algorithm to detect a human pose is like teaching a child how to recognize things鈥攜ou feed it different examples,鈥 says Xu. By taking in a massive collection of examples, the deep neural networks learn to locate players, their body parts, and the ball during a match. Beyond recognizing players and the ball in individual frames, these systems continuously track them over time and across multiple camera views, which is critical for determining offside positions and identifying who touched the ball last.
During this year鈥檚 World Cup matches, 鈥攆eeding those tracking systems with live data during games.
鈥淭raining a computer-vision algorithm to detect a human pose is like teaching a child how to recognize things鈥攜ou feed it different examples.鈥
Why so many cameras? A single camera view can be blocked or misleading. Multiple cameras enable the triangulation of the ball, players, and boundaries to create precise reconstructions in three dimensions. Those 3D reconstructions are generated in seconds and then provided to officials who make the final call.
鈥淛ust like with humans, if you block one of your eyes, it鈥檚 very hard to perceive depth,鈥 says Xu. 鈥淏ut when you have both of your eyes open, you can actually fill out the depth and 3D location of the object you鈥檙e looking at.鈥
How can AI refereeing tools work so quickly?
FIFA estimates that the tracking cameras provide more than 150 million tracking data points per match. That鈥檚 a lot of data to manage. So, the speed comes from specialization.
鈥淲hen FIFA deploys these deep neural networks, they only need them to work well in very particular scenarios,鈥 says Xu. 鈥淵ou don鈥檛 necessarily need your algorithm to recognize a bird, fans, or anything else unrelated to the match; you just need them to recognize the players.鈥
That narrower focus helps the system process a still massive stream of match data quickly. A model may begin as a large neural network trained on many kinds of images, according to Xu. Then, it gets refined and scaled back for the specific problems it needs to solve on the pitch.

Xu says these applications would have been hard to imagine just a decade or so ago. Two advances made the systems of today possible: deep neural networks and graphics processing units (GPUs).
The deep neural networks鈥攎achine learning systems inspired by the human brain鈥攖hat have emerged in recent years dramatically improved performance on visual recognition and tracking tasks compared with many earlier approaches. These networks excel at taking vast amounts of unstructured data and identifying complex relationships with little human intervention.
鈥淣eural networks have changed the whole paradigm since it鈥檚 no longer necessary to have manually designed features that we need to train the system to look for,鈥 says Xu. 鈥淵ou input the image and the system automatically learns the visual representations needed for the task.鈥
Meanwhile, the capabilities of GPUs鈥攖he electronic circuits specifically designed to process and generate videos, images, and 3D graphics鈥攋umped significantly in the 2010s, making today鈥檚 large-scale AI systems possible.
鈥淭he computing power has gotten so much better, so we can train those large neural networks with tons of data that we couldn鈥檛 imagine maybe 10 or 15 years ago,鈥 says Xu.
Where else is this technology used?
While similar systems are used for , , and , Xu says the technology has applications outside of sports as well.
鈥淭his is very similar to the technology that you deploy in self-driving cars,鈥 says Xu. 鈥淭hose systems need to figure out the vehicle鈥檚 environment, detect different traffic participants and track them over time, and have a decision system built inside to choose whether to accelerate, apply the brakes, or change lanes.鈥
Xu thinks the underlying computer vision technology could be used for security, surveillance, and other settings where cameras need to follow activity across a complex physical space.
鈥淚f you want a smart system that tracks people鈥檚 activity on a property that contains multiple buildings鈥攊ndoors and outdoors鈥攁nd you have cameras deployed in different locations throughout the property, you can see the parallels,鈥 says Xu. 鈥淛ust like in a soccer match, you could use these systems for person detection and tracking and perhaps reviewing a 3D reconstruction of the property.鈥
Even as the technology behind the World Cup becomes faster and more sophisticated, Xu says the human element remains at the heart of the game. Computer vision can help officials determine whether a player鈥檚 toe drifted offside or who touched the ball last. But at least for now, it can鈥檛 predict the brilliance of a last-minute goal, the agony of a missed penalty kick, or the collective joy and heartbreak that keep billions of fans watching until the final whistle.