Google’s DeepMind AI division handled everything from StarCraft to me folding protein. So it’s probably no surprise that creators eventually turned to what is undoubtedly a personal interest: computer programming. In Thursday’s edition of Science, the company describes a system it has developed that produces code in response to programming typical of that used in human programming competitions.
In the medium challenge, the AI system can score near the top half of the participants. But it had some difficulties with scaling, as it was less likely to produce a program that worked on problems that usually required more code. However, the fact that it works at all without having any structural information about algorithms or programming languages is a bit of a surprise.
Rise to the challenge
Computer programming challenges are fairly simple: people are given a task to complete and produce code that must perform the required task. In an example reported in the new paper, programmers were given two strings and asked to determine whether the shorter of the two could be produced by substituting backspaces for some of the keystrokes needed to type the larger string. Submitted programs are then checked to see if they provide a general solution to the problem or fail when additional examples are tested.
Given enough examples of software that can solve a single problem, it should probably be possible for an AI system to infer the algorithmic structure needed for success. But this won’t be a general solution to address any problems; An AI trained on one category of challenge will fail when asked to tackle an unrelated challenge.
To make something more generalizable, the DeepMind team treated it more like a language problem. To some extent, the description of the challenge is an expression of what the algorithm should do, while the code is an expression of the same thing, just in a different language. So the AI in question is designed to have two parts: one part takes the description and turns it into an internal representation, and the second part uses the internal representation to generate functional code.
Training the system was also a two-stage process. In the first phase, the system was simply required to process a snapshot of the material on GitHub, totaling more than 700GB of code. (These days where you can fit that on a thumb drive, that might not sound like much, but remember that the code is just raw text, so you get a lot of lines per gigabyte.) Note that this data will also include comments, which you should be using A natural language to explain what the nearby code does, and therefore should help with both input and output tasks.
Once the system is trained, it goes through an adjustment period. DeepMind sets up its own programming contests and then feeds the results into the system: problem description, working code, failed code, and test cases used to verify it.
Similar approaches have been tried previously, but DeepMind reports that it was able to throw more resources into training. The paper states that “a key driver of AlphaCode’s performance came from increasing the number of model samples to orders of magnitude greater than previous work.”