Google’s work on artificial neural networks, algorithms and software that mimic the workings of the brain, is well known. It has resulted in vast improvements in image and speech recognition. These networks are “trained” by being shown millions of examples of a certain object (for instance a banana) and allowing it to build up an “idea” of what constitutes a banana, with each part of the network looking for particular clues (edges, shape and colour for example) and feeding it through to the next layer until an answer is reached.
One of the problems that researchers faced was understanding exactly what each layer of the network was doing, what information it was considering relevant from the test pictures, after all, going back to the banana example, it’s relevant that they are a yellow curved tube that tapers at the end, it isn’t relevant that they are in a fruit bowl or sitting on the shelves in Tesco.
One way they figured out how to see this was to turn the whole thing on it’s head. Feeding an image of semi random noise, which has similar statistical features to real images, and asking the network to tweak it until it reached and image that matched the item named. This produced some weird and wonderful results.
Starting with this
It also showed cases where things were going wrong. For instance with dumbbells, it clearly thinks that they always come with arms attached.
The results from the higher level of the network, which are intended to identify more sophisticated features in images, are even stranger. Remember looking at clouds as a child and trying to see shapes in them? Well the network was told to do the same, adjusting and amplifying what it saw. For instance, if it saw a bird it would adjust the image to be more birdlike, this was looped, with the image becoming more birdlike with each pass. Since the network was trained on animals this is what it saw, but because of the abstracted nature of it’s information what you ended up with was basically a “remix”