“The most important challenge in self-driving is safety,” says Abbeel. “With a system like LINGO-1, I think you get a much better idea of how well it understands driving in the world.” This makes it easier to identify the weak spots, he says.
The next step is to use language to teach the cars, says Kendall. To train LINGO-1, Wayve got its team of expert drivers—some of them former driving instructors—to talk out loud while driving, explaining what they were doing and why: why they sped up, why they slowed down, what hazards they were aware of. The company uses this data to fine-tune the model, giving it driving tips much as an instructor might coach a human learner. Telling a car how to do something rather than just showing it speeds up the training a lot, says Kendall.
Wayve is not the first to use large language models in robotics. Other companies, including Google and Abbeel’s firm Covariant, are using natural language to quiz or instruct domestic or industrial robots. The hybrid tech even has a name: visual-language-action models (VLAMs). But Wayve is the first to use VLAMs for self-driving.
“People often say an image is worth a thousand words, but in machine learning it’s the opposite,” says Kendall. “A few words can be worth a thousand images.” An image contains a lot of data that’s redundant. “When you’re driving, you don’t care about the sky, or the color of the car in front, or stuff like this,” he says. “Words can focus on the information that matters.”
“Wayve’s approach is definitely interesting and unique,” says Lerrel Pinto, a robotics researcher at New York University. In particular, he likes the way LINGO-1 explains its actions.
But he’s curious about what happens when the model makes stuff up. “I don’t trust large language models to be factual,” he says. “I’m not sure if I can trust them to run my car.”
Upol Ehsan, a researcher at the Georgia Institute of Technology who works on ways to get AI to explain its decision-making to humans, has similar reservations. “Large language models are, to use the technical phrase, great bullshitters,” says Ehsan. “We need to apply a bright yellow ‘caution’ tape and make sure the language generated isn’t hallucinated.”
Wayve is well aware of these limitations and is working to make LINGO-1 as accurate as possible. “We see the same challenges that you see in any large language model,” says Kendall. “It’s certainly not perfect.”