What is Artificial Intelligence?

An analysis of the most used term in 2024.

Artificial intelligence is a subject that has blown up in popularity in the last couple of years. It has become part of the everyday vocabulary of all generations, and is something the vast majority now relate to both in their private and professional lives.

But what exactly is artificial intelligence? If you had been asked to speak on NRK and asked to define KI for the entire population on live television, how confident are you in your case?

The truth is that “artificial intelligence” is a term with many meanings. Britannica defines artificial intelligence (or “AI”) as:

“Artificial intelligence (AI) [is] the ability of a digital computer o controlado computadora robot to perform tasks commonly associated with intelligent beings.”

SNL However, defines artificial intelligence as:

“Artificial intelligence is information technology which adjusts its own activity and therefore appears as intelligently.”

PwC in turn, defines artificial intelligence as:

Artificial intelligence (AI), or “Artificial Intelligence (AI)” in English, is about developing computer systems that can learn from one's own experiences and solve complex problems in different situations and environments. If a machine can solve problems, perform a task, or display other cognitive functions that a human can then we can say that it has artificial intelligence.”

PwC, for its part, also refers to ingen.no, which has the following definition:

“Artificially intelligent systems perform actions, physical or digital, based on the interpretation and processing of structured or unstructured data, with the intention of achieving a given objective.”

All of these definitions are problematic. First of all, artificial intelligence is not must be able to learn and adjust their own behavior. For example, let's say we have an artificial intelligence that drives a car perfectly, but has no ability to learn (from mistakes). The latter deficiency will not disqualify it from being counted as an artificial intelligence by anyone who sees its superior driving skills.

Obviously, there are also some machines that can solve so-called human problems that, at least in the historical context, do not count as artificial intelligence. For example, a weaving machine.

The government's definition is also problematic. There is little difference between the government's definition of “artificially intelligent systems” and an arbitrary definition of a computer program. It's also long, hard to remember, and why mention both structured and unstructured data when both are included?

At the same time, this is precisely the problem of “artificial intelligence”: it is a broad term that encompasses a lot. It is thus difficult to define.

English Wikipedia's definition illustrates this well:

“Artificial intelligence (AI), in its broadest sense, is intelligenza exhibited by máquinas, particularly computer systems.”

Translated into Norwegian:

Artificial intelligence (KI), in its broadest sense, is the intelligence exhibited by machines, specifically computers.

Actually, we've just reduced the definition problem to defining “intelligence”. At the same time, this is a question that we have a more intuitive understanding of. By accepting that “artificial intelligence” is intelligence in the context of machine execution, and possibly specifically in the context of computers, we have a definition that is straightforward to relate to and easy to remember.

To substantiate our definition and convince us that it makes sense, here are some examples I think we can agree that are “artificial intelligence”:

  • ChatGpt.
  • Midjourney and Dall-E.
  • Self-driving cars.
  • Chess programs that easily pit the world's best chess players (sorry Magnus). Ev. Go programs that crush the world's best Go players, if you prefer Go over chess.
  • The new Optimus Robot to Tesla.
  • Chatbots on websites, which never manage to help you with what you actually need.
  • All-digital assistants that do tasks at your command.

History

Artificial Intelligence

Few disciplines have as defined a start as artificial intelligence. In 1956 assembled John McCarty a group of academics to what has become known as Darthmouth Summer Research Project on Artificial Intelligence. The project's formal description, written by McCarthy and others, gives a good indication both of what it was thought that KI was and should be:

We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover Township, New Hampshire. Studi je na. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.[1]

The term “Artificial Intelligence” was specifically coined by McCarthy for the occasion.

Turing and the Turing Test

Even before this, the mathematician had Alan Turing did a lot of research on thinking machines. One of the inventions he is most famous for is the Turing test. It was introduced in the article Computing Machinery and Intelligence, published in 1959.

The Turing test is a test that indicates whether a machine possesses intelligence that is indistinguishable from the intelligence of a human. The idea in the tests is that a judge, a human, observes natural language between a machine and a human. The judge knows that one of the contestants is a machine (this is not entirely clear in Turing's original script). Whether the judge fails to distinguish who is human and who is machine passed the turing test.

The Turing test has and is an important component in the world of artificial intelligence. At the recent launch of ChatGPT, it was not long before it was checked whether it beats the Turing test. Something it does[2].

Neural networks

Basic network

Much research in artificial intelligence deals with what is known as “neural networks”.

In the context of artificial intelligence, a neural network is a computer model that is inspired by biological neural networks in animals.

The networks work by having a multitude of nodes (which mimic neurons) and a plethora of links between the nodes (which mimic synapses).

Typically, a distinction is made between three layers in a classical neural network: the input layer, the hidden layer, and the output layer.

In the neural network, numbers are sent. First to the incoming team. When a node takes in a number, it takes the numbers it has received, passing the numbers through a formula called a activation function, which calculates a new number that is the outgoing value the node should pass on. This value is passed on over all the outgoing links from the node.

This simulates how our brain works. In the brain, electrical impulses are sent between our neurons. Each neuron has a built-in activation function, which is more complex than the models we typically use in artificial neural networks. When a neuron receives an electrical impulse, whether or not it sends an impulse forward, and the strength of that impulse, will vary depending on the neuron's internal “activation function”.

Weighting

The links between the nodes (often called “edges”, “edges” in English) all have a numeric value. This value is called a weight. This weight indicates how strong a link is between two nodes. A number sent out one node to another will be multiplied by the weight of the edge between the nodes.

When creating a neural network to solve a task there are, in practice, two jobs that one must do:

  1. Designing the network. How many nodes should we have, and how should they be connected to each other?
  2. Put weights on all edges.

In principle, this is all that is needed. The problem is that there are an awful lot of networks one can make, and an infinite amount of different numbers one can put on the scales.

So how to choose the right weights?

Learning and Backward Propagation

To set proper weights on a neural network, one must work out network.

Say, for example, you have a network that you want to detect whether an image contains a cat or not. The incoming nodes of the network can be set up to receive the pixels of the images. You need one node per pixel, and if you have images of different sizes, you need to normalize them. There can be one outgoing node which is interpreted so that if the node has the value 1 there is a cat in the image, and if the node has the value 0 it is not a cat in the image.

To train this network, one gladly prepares thousands of pictures of cats (if not more) and thousands of pictures without cats. For each image, one has a sequence (vector) with numbers representing the pixels of the image, and a number 0 or 1 indicating whether the image contains a cat or not. This is the training data to the network.

Then we run each image through our network, which has typically been set up with completely random weights on the edges. On the other side, we get a value that indicates whether the network thinks there is a cat or not in the image. At first, this network is likely to predict completely incorrectly. For driving we can do a process called Backward propagation. This takes the value that came out of the network (which is probably not exactly equal to what we expect), and “propagates” the error backwards in the network, adjusting each edge so that the error should be smaller on the next run. How much each edge aligns each time is called for learning rate to the training.

Backward propagation is what allows us to train neural networks on large amounts of data, and with those few to do the most incredible things. Without backward propagation, every number in a network would have had to be set in a manual or other systematic way by the engineers creating the networks.

For ChatGPT-4, there are about 1,800 billion such numbers that need to be trained. In other words, something that is completely inaccessible to carry out manually.

Backward propagation was first published by Seppo Linnainmaa in 1970[3].

Universality

Neural networks are useful tools for solving complex problems. Since we can train the networks on large amounts of data, it is easy to imagine that all problems can be transformed into training data and trained on a network.

This is a thought that also has some support in mathematics. Neural networks are namely universal function approximators. Simply summarized, it means that neural networks can, in theory, mimic arbitrary mathematical functions. And since arbitrary mathematical functions can largely describe most problems we want to solve in the computer world, neural networks can do this as well.

This is called for universality.

Important secondary questions, however, are: A) Where readily is it to approximate a function (a problem) with a neural network, B) Do we have enough training data? and C) Are there easier ways to solve the problem?

The answer to these questions can often point towards that neural networks not is the correct method to solve a certain problem.

Example

Here is an example of a very simple neural network written (by ChatGPT) to mimic the XOR formula:

import numpy as np

# Activation functions and their derivatives
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

# Training data (X: inputs, y: outputs)
X = np.array([[0, 0],
              [0, 1],
              [1, 0],
              [1, 1]])

y = np.array([[0], [1], [1], [0]])  # XOR problem

# Seed for reproducibility
np.random.seed(42)

# Initialize weights
input_size = 2
hidden_size = 2
output_size = 1

weights_input_hidden = np.random.rand(input_size, hidden_size)
weights_hidden_output = np.random.rand(hidden_size, output_size)

# Learning rate
lr = 0.1

# Training loop
for epoch in range(10000):
    # Forward pass
    hidden_input = np.dot(X, weights_input_hidden)
    hidden_output = sigmoid(hidden_input)

    final_input = np.dot(hidden_output, weights_hidden_output)
    final_output = sigmoid(final_input)

    # Calculate error
    error = y - final_output
    if epoch % 1000 == 0:
        print(f'Epoch {epoch}, Error: {np.mean(np.abs(error))}')

    # Backpropagation
    d_output = error * sigmoid_derivative(final_output)
    d_hidden = d_output.dot(weights_hidden_output.T) * sigmoid_derivative(hidden_output)

    # Update weights
    weights_hidden_output += hidden_output.T.dot(d_output) * lr
    weights_input_hidden += X.T.dot(d_hidden) * lr

# Test the network
print("\nTesting the network:")
for i, x in enumerate(X):
    hidden_output = sigmoid(np.dot(x, weights_input_hidden))
    final_output = sigmoid(np.dot(hidden_output, weights_hidden_output))
    print(f"Input: {x}, Predicted Output: {final_output}, Actual Output: {y[i]}")


Skrevet av
Tormod Haugland

Andre artikler