Synthetic Data Speeds AI Training
The world’s largest companies (including Google, Facebook, Amazon and Apple) generate mountains of data. Thanks to their resources to house and power server farms and having millions of consumers tied into their product and software ecosystems, this data exhaust grows exponentially for those at the top. But what about smaller, equally innovative, companies that lack access and capital to train and test AI?
What is fake data? For the most part, the articles above cite examples with image recognition companies who need to propagate variations and sheer volume of images in order to train real-world systems. As Wired author Tom Simonite points out, “Giving machines digital doubles can help robots learn to better handle objects in factories or homes.”
As Benedict Evans, Partner at Andreessen Horowitz, points out, there have already been traditionally non-technical companies to benefit: “I met a company recently that supplies seats to the car industry, which has put a neural network on a cheap DSP chip with a cheap smartphone image sensor, to detect whether there’s a wrinkle in the fabric. It’s not useful to describe this as ‘artificial intelligence’: it’s automation of a task that could not previously be automated. A person had to look.”
Beyond image-recognition algorithms, fake environments are routinely used for autonomous driving research. Last year, Bloomberg even uncovered that the popular video game series Grand Theft Auto was being used by some, before the publisher pulled the plug on the project.
Time will tell whether we will see synthetic data truly level the playing field for applying Machine-learning, or if the examples written about by Wired and TechCrunch remain outliers and early adopters.