What is “AI”, anyway?
There seems to be a lot of discussion swirling around the Internet recently about AI art-generation tools, the ChatGPT chatbot, and the related ethical issues around using them. In the interest of splashing more fuel on the fire, I’m going to write a series of posts on the background of how these tools work, what I think the ethical implications of using them are, and where we might be headed next.
Today’s post is about what the term “AI” means, in theory and in current practice, along with a little bit of history about how we got here. This is going to be very high-level, and I will link out to more in-depth explanations where I can find good ones. There will be some jargon, but I’ll try to keep it as minimal as I can.
A very brief history
AI stands for Artificial Intelligence, and is generally the study of “how to get computers to do things that human brains are good at”. The history here goes right back to the very early days of practical electronic computers. People have been doing AI research since the 1950s, and there have been a lot of different approaches that have been tried. Some have been more-or-less successful, but they never really became a mainstream phenomenon. Until the relentless growth in computing power made one remarkably-flexible but computationally-expensive technique practical to the extent that it’s essentially become synonymous with AI today.
Expert Systems – codifying decision-making
Early on (say, through the mid-80s), people thought it would be possible to teach computers to perform tasks like a human by asking human experts how they did their jobs, and codifying that into a set of rules that the computer could follow to perform the same task. Initially, this was accomplished by literally interviewing experts, then handing the results off to programmers. That was obviously not going to be practical, so they made programming languages optimized for rule-based decision-making, and then moved on to designing systems that could actually interrogate the experts and build rule inferences from there.
Ultimately, the expert system approach stumbled over “the natural language problem”. It turns out that human languages are very complex and imprecise, so much so that “natural language processing became (and is) its own separate area of study in AI. Without a way for the computer to ask a human questions, and understand the answers, it’s very difficult to produce expert systems with any real level of “expertise” Anything system involving over a few hundred “rules” is just extremely tedious to build.
One really great thing about expert systems is that it’s possible to backtrack through the decision-making process, and see how the decision was arrived at. This is even a built-in feature of systems like Prolog. Spoiler alert: This property, which is critical for accountability of AI-based systems, is not a feature of the types of AI in most common use today.
Neural Networks – an artificial brain?
An obvious approach to take to make AI is to build a simulation of a human brain inside a computer. And this is something that’s been pursued for decades, too. You can easily model a single neuron, or a couple of neurons – it’s just an on-off switch, with a bunch of connections of various strengths, and when a neuron fires, it either excites or inhibits the firing of other neurons it’s connected to. It’s just a bit of simple math. You take the state of each connected neuron, multiply it by a weighting factor, add them all together, and then either fire the neuron, or not. Simple.
But there is a fundamental problem – brains are immensely complicated. You have 100 billion neurons in your brain, and they have a staggeringly-large number of connections between them: 1015 connections, which is 1 million billion. You’d need thousands of terabytes of data just to store the strength of the connections.
So, we had to be content with very simple neural networks for research back in the 80s & 90s. Even with dedicated hardware acceleration, you’d have been lucky to simulate hundreds of neurons at anything like reasonable speed. But then…
Why is AI suddenly everywhere?
Moore’s Law, GPUs, and massively-parallel processing
Computer power has increased, year over year, for basically the entire time I’ve been alive. In particular, Moore’s Law, named for Gordon Moore, states that the number of transistors on a chip will consistently double over a certain time frame (originally every year, now closer to 24 months). That means that in the last 40 years, the number of transistors on a chip has increased something like 220 or one million times. And that’s actually fairly close to accurate, depending on what you count as a “typical” chip in 1982 and 2022.
And for years, those higher transistor counts made for faster processors, in very close correlation with the number of transistors. They made the processors data widths wider, added more cache, increased the number of instructions that could be run at the same time, etc, etc. But then, towards the end of the 1990s, things started to stall out a bit – transistor budgets kept increasing, but the amount of performance to be extracted from a single processor core had somewhat plateaued, and slapping more cores on a CPU die had diminishing returns, due to lack of software support and increased coordination burden between the processor cores.
But then, along came dedicated 3D graphics hardware for personal computers. These Graphics Processing Units, or GPUs, were radically-simpler designs than a general-purpose CPU. They were designed to do just the simple kinds of mathematical operations used in computer graphics, to do them quickly, and to be very small, in terms of transistor count.
Graphics processing is in the class of problems that computer scientists call “embarrassingly-parallel”, meaning that it takes no particular effort on the part of the programmer to run a bunch of these calculations at the same time. For example, if you’re rendering a scene in a video game, you can run “the same” set of calculations (with different data) for every single pixel on the screen.
GPU designs were able to scale very efficiently with increasing transistor count, since the individual elements are simple, and they work largely independently of each other. So, they got faster and faster at a tremendous pace.
Do you remember what else relies on a bunch of simple calculations, repeated an enormous number of times? That’s right – neural network simulation!
The Rise of Deep Learning
It turns out that GPUs were also really good for running neural network simulations, and they’ve since started to get optimized just for that use case. With a modern GPU, you can run much larger neural network simulations, much faster than you can on a conventional CPU.
And that means that you can start solving interesting problems with these extremely computationally intensive algorithms. I’ll go into greater detail in another post, but for purposes of this discussion, a neural network is “just” a set of simulated neurons, with weighted connections between them. There is some amount of pre-processing, then data flows through the neural network (possibly multiple times), then it gets post-processed to produce the final result.
This same basic architecture can be used to do all of the many things we do with modern “AI algorithms”, including facial recognition, automated image tagging, social media timeline curation, computer-generated poetry, insurance claim evaluation, virtual Zoom meeting backgrounds, voice recognition, and all of the other neat tricks you’ve seen and heard about.
AI really IS everywhere
It’s likely that almost any computer system that you interact with on a regular basis uses machine learning somewhere in its architecture. It is such a useful technique, generally applicable to so many useful problems, that there are very few places where it can’t add some value.
So that’s how we got to where we are today – “AI” is everywhere, and it’s essentially all based on the same basic technique – simulations of ever-larger neural networks, by massively-repeated application of very simple calculations.
How does Machine Learning work, in a practical sense? How do a bunch of numbers in a matrix turn into art? That’s the next article…