The XOR (exclusive OR) problem is a foundational example in machine learning and artificial intelligence. It demonstrates the challenges of non-linear classification, where simple linear models fail to separate the data correctly. The XOR logic gate outputs 1 if and only if the two binary inputs are different; otherwise, it outputs 0. This problem serves as a classic benchmark for understanding the power of neural networks and their ability to handle non-linear decision boundaries.

Below is the dataset for the XOR problem. This dataset forms the basis for model training and evaluation.
| Input 1 | Input 2 | Output |
|---|
The scatter plot below shows the XOR dataset. Points with output 0 are displayed in blue, and those with output 1 are in red.
In this section, you can configure the architecture of the neural network for the XOR problem. Specify the number of units (neurons) and the activation functions for each layer. Layer 1 is the hidden layer, and Layer 2 is the output layer. The configuration directly affects the model's ability to learn and predict the XOR logic. Select appropriate values to experiment with different architectures.
The following visualization represents the architecture of the neural network based on the selected configuration.
Configure the training parameters for the neural network model. Adjust the learning rate to control the speed of optimization during training. Choose the number of epochs to define how many times the model will iterate over the entire dataset. Click "Run" to start training the model and monitor the progress, including the loss and epoch count, in real time.
Choose inputs to predict the XOR output:
Prediction: N/A
The XOR (Exclusive OR) problem is a classic challenge in machine learning that demonstrates why simple linear models cannot solve certain classification problems. XOR outputs 1 only when inputs are different (0,1 or 1,0) and outputs 0 when inputs are the same (0,0 or 1,1). This creates a non-linear decision boundary that requires a neural network with at least one hidden layer to solve.
A single perceptron can only create linear decision boundaries (straight lines in 2D space). The XOR problem requires separating points diagonally, which is impossible with a single straight line. This is why the XOR problem is called "linearly inseparable" and requires a multi-layer neural network. This limitation, discovered by Marvin Minsky and Seymour Papert in 1969, led to the development of modern deep learning.
A neural network needs at least 3 layers to solve XOR: an input layer (2 neurons), one hidden layer (typically 2-4 neurons with non-linear activation like tanh or ReLU), and an output layer (1 neuron with sigmoid activation). The hidden layer creates the non-linear transformation needed to separate the XOR patterns. You can experiment with different architectures using our interactive tool above.
The XOR dataset consists of just 4 data points: [0,0]→0, [0,1]→1, [1,0]→1, [1,1]→0. Despite being the smallest possible classification dataset, it demonstrates fundamental non-linear classification challenges and serves as a critical test for neural network architectures. The dataset is shown in the visualization section above.
Non-linear activation functions like tanh, sigmoid, and ReLU all work well for solving XOR. The key requirement is non-linearity - linear activation functions will fail just like a single perceptron. In practice, tanh and ReLU are most commonly used for the hidden layer, while sigmoid is typically used for the output layer to produce binary classification outputs between 0 and 1.
The XOR problem is historically significant because it exposed the limitations of single-layer perceptrons in the 1960s, causing the first "AI winter." It proved that neural networks need hidden layers and non-linear activation functions to solve real-world problems. Today, it serves as the simplest example to teach students about non-linear decision boundaries, feature engineering, and the power of deep learning.