My notes on Superintelligence by Bostrom

On the plane to the US I finished reading Nick Bostrom’s Superintelligence. I jotted down notes as I went and thought a few friends might be interested so posting here.

Bostom’s background spans philosophy (he is a professor at Oxford), computational neuroscience and physics - his breadth of knowledge makes this a broad reaching read. It’s particularly interesting if you have a basic understanding of machine learning and want to understand some of the philosophical and ethical questions raised by superintelligent machines.

A few things that stood out for me:

- various surveys of AI experts (who are plausibly at the optimistic end of the spectrum :) ) peg the likelihood that we will see machines with human level intelligence by 2040 at 50%, and 90% by 2075

- Bostrom convincingly argues that once human level machine intelligence emerges we may rapidly see an ‘intelligence explosion’ where the intelligent machines self-enhance their own software/intelligence at high speed. This leads to machines that are superintelligent. Since software can be copied the population of superintelligent machines can grow rapidly.

- He then argues that given the kinetics of such an explosion one entity may end up rapidly accelerating past other machine intelligence projects and forming a dominant position. This echoes the writing of Lanier and others on the increasing centralisation of power within the technology industry. He makes a particularly interesting point that digital agents may tend to greater centralisation of control due to reduced inter-agent transaction costs. For example the idea that firms or nations of machines could massively increase in size.

- the majority of the book focuses on what happens after a superintelligence emerges. He draws an interesting distinction between having more intelligence and more wisdom - and the risks of one developing without the other. He gives a hilarious example, worthy of Foster-Wallace where a superintelligent machine is tasked with producing 1000 paperclips. The machine, being superintelligent and supercapable rapidly produces 1000 paperclips. However, being a perfect Bayesian agent it is also aware that observational error may mean that it has actually produced fewer paperclips than this - there is a tiny but real chance it has only produced 999. So to remedy this it commandeers all the resources in the known universe to more accurately count whether it has actually produced 1000 paperclips or not. He lays out various types of superintelligence and various ways that things could go badly wrong for humanity from goal functions that on first glance seem to be bounded, but per the paperclip example are not. At lot of this seems to be the difference between programatic logic and 'common sense’ and the complexity in creating a bridge from one to the other.

- he draws an interesting parallel between the fate of humans in a world with superintelligent machines, and the fate of horses in a human world. The horse population grew massively through the 1900s as a complement to carriages and ploughs, but then declined with the arrival of automobiles & tractors. The population of horses was 26m in the US in 1915 but declined to 2m by the early 1950s. The flipside of this is that the horse population subsequently returned to 10m driven by economic growth that have allowed more humans to indulge in leisure activities involving horses. 

- He explores how superintelligent machines might acquire their values. This section on value loading techniques is very interesting and summarises some of the most interesting mathematical and philosophical challenges facing the AI space. For example in one unfinished solution to the value loading problem we have a subset of intelligent machines that are known to have values that are safe for humans. These machines are allowed to develop a incrementally more intelligent machine - where the step in intelligence between the first group of machines and the mutation is small enough that the earlier machines can still test the new, slightly smarter machine to see if its values remain compatible with humanity’s safety. He makes the terrifying point that if there is an arms race going on for one company or nation to develop superintelligent machines first, this kind of caution is unlikely to be on the path of the 'winning’ project - 'move fast and break things’ seems like a bad motto when you are playing with something this powerful.

- Having framed the challenges of loading a superintelligence with values, he then moves to what values we want this superintelligent to have. Bostrom argues that humanity may have made relatively little progress on answering key moral questions and is likely still labouring under some grave moral misconceptions. Given that are we in a position to specify a moral framework for a superintelligent machine? He introduces the concepts of Indirect Normativity and coherent extrapolated volition in response to this - a hedge against our own limited moral framework and a bet that the machine can do better:

“Our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted” - Yudakowski

finally he asks how to ensure that the immense economic windfall resulting from superintelligence should be distributed to benefit all of humanity, not just a narrow set of people (or machines).

Overall I found it very stimulating and would recommend.