Technical Definition
Artificial neural networks are computational models inspired by biological neural networks found in brains. They consist of interconnected nodes (neurons) organized in layers, where each connection has an adjustable weight that enables the network to learn patterns from data.
Network Architecture
Basic Components
Neurons (Nodes)
- Receive weighted inputs
- Apply activation function
- Pass output to next layer
Weights
- Multiply input values
- Adjusted during training via backpropagation
- Store learned information
Biases
- Added to weighted sum
- Help shift activation function
- Improve model expressiveness
Activation Functions
- ReLU: max(0, x) - non-linearity for hidden layers
- Sigmoid: 1/(1+e^-x) - outputs between 0 and 1
- Tanh: (e^x - e^-x)/(e^x + e^-x) - outputs between -1 and 1
- Softmax: exponential normalization - for multi-class outputs
Network Layers
Input Layer
↓
Hidden Layers (Feature Learning)
↓
Output Layer (Prediction)
- Input Layer: Receives raw features
- Hidden Layers: Extract features and patterns
- Output Layer: Produces predictions
Code Example: Feed-Forward Neural Network
import numpy as np
from typing import List, Tuple
import matplotlib.pyplot as plt
class NeuralNetwork:
def __init__(self, layer_sizes: List[int], learning_rate: float = 0.01):
"""
Initialize neural network with specified architecture
Args:
layer_sizes: List of neuron counts per layer
learning_rate: Learning rate for gradient descent
"""
self.learning_rate = learning_rate
self.weights = []
self.biases = []
self.layer_sizes = layer_sizes
# Initialize weights and biases
for i in range(len(layer_sizes) - 1):
w = np.random.randn(layer_sizes[i], layer_sizes[i + 1]) * 0.01
b = np.zeros((1, layer_sizes[i + 1]))
self.weights.append(w)
self.biases.append(b)
def relu(self, x):
"""ReLU activation function"""
return np.maximum(0, x)
def relu_derivative(self, x):
"""ReLU derivative for backpropagation"""
return (x > 0).astype(float)
def sigmoid(self, x):
"""Sigmoid activation function"""
return 1 / (1 + np.exp(-np.clip(x, -500, 500)))
def sigmoid_derivative(self, x):
"""Sigmoid derivative"""
return x * (1 - x)
def forward(self, X: np.ndarray) -> Tuple[np.ndarray, List]:
"""
Forward propagation through network
Args:
X: Input data (samples, features)
Returns:
Output predictions and cache for backpropagation
"""
cache = []
A = X
for i in range(len(self.weights)):
Z = np.dot(A, self.weights[i]) + self.biases[i]
# Use ReLU for hidden layers, sigmoid for output
if i < len(self.weights) - 1:
A = self.relu(Z)
else:
A = self.sigmoid(Z)
cache.append((Z, A))
return A, cache
def backward(self, X: np.ndarray, y: np.ndarray,
output: np.ndarray, cache: List) -> None:
"""
Backward propagation to compute gradients
Args:
X: Input data
y: Target labels
output: Network output
cache: Cached values from forward pass
"""
m = X.shape[0] # Number of samples
# Output layer gradient
dA = (output - y) / m
for i in reversed(range(len(self.weights))):
Z, A_prev = cache[i]
A_current = A_prev if i > 0 else X
# Gradient computation
if i < len(self.weights) - 1:
dZ = dA * self.relu_derivative(Z)
else:
dZ = dA * self.sigmoid_derivative(output)
dW = np.dot(A_current.T, dZ)
dB = np.sum(dZ, axis=0, keepdims=True)
# Gradient for previous layer
if i > 0:
dA = np.dot(dZ, self.weights[i].T)
# Update weights and biases
self.weights[i] -= self.learning_rate * dW
self.biases[i] -= self.learning_rate * dB
def train(self, X: np.ndarray, y: np.ndarray,
epochs: int = 100, batch_size: int = 32):
"""
Train the neural network
Args:
X: Training data
y: Training labels
epochs: Number of training iterations
batch_size: Samples per batch
"""
losses = []
for epoch in range(epochs):
epoch_loss = 0
num_batches = X.shape[0] // batch_size
for batch in range(num_batches):
start_idx = batch * batch_size
end_idx = start_idx + batch_size
X_batch = X[start_idx:end_idx]
y_batch = y[start_idx:end_idx]
# Forward and backward pass
output, cache = self.forward(X_batch)
self.backward(X_batch, y_batch, output, cache)
# Compute loss
loss = -np.mean(y_batch * np.log(output + 1e-8) +
(1 - y_batch) * np.log(1 - output + 1e-8))
epoch_loss += loss
epoch_loss /= num_batches
losses.append(epoch_loss)
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch + 1}/{epochs}, Loss: {epoch_loss:.4f}")
return losses
def predict(self, X: np.ndarray) -> np.ndarray:
"""Make predictions on new data"""
output, _ = self.forward(X)
return (output > 0.5).astype(int)
# Usage
X_train = np.random.randn(100, 10)
y_train = np.random.randint(0, 2, (100, 1))
nn = NeuralNetwork([10, 16, 8, 1], learning_rate=0.01)
losses = nn.train(X_train, y_train, epochs=100, batch_size=16)
y_pred = nn.predict(X_train)
accuracy = np.mean(y_pred == y_train)
print(f"Training Accuracy: {accuracy:.4f}")
Implementation Requirements
Hardware
- GPUs: NVIDIA CUDA-capable for faster training
- Memory: Sufficient RAM for model parameters and batch data
- Processing: Multi-core CPUs for parallelization
Software
- Python with TensorFlow or PyTorch
- NumPy for numerical operations
- Matplotlib/Seaborn for visualization
Data Requirements
- Normalized inputs (zero mean, unit variance)
- Sufficient training samples (rule of thumb: 10+ samples per parameter)
- Label data for supervised learning
Technical Limitations
- Overfitting: Networks can memorize training data instead of generalizing
- Vanishing Gradients: Training deep networks with many layers becomes difficult
- Computational Cost: Training requires significant compute resources
- Hyperparameter Tuning: Many hyperparameters to optimize
- Black Box Nature: Difficult to interpret what network learned
- Data Requirements: Need large labeled datasets for good performance
Performance Considerations
Training Optimization
- Batch Normalization: Stabilize training with normalized layer inputs
- Dropout: Regularization technique to prevent overfitting
- Early Stopping: Stop training when validation accuracy plateaus
- Learning Rate Scheduling: Decay learning rate during training
Inference Optimization
- Model Quantization: Reduce precision to lower memory and compute
- Pruning: Remove unimportant connections
- Knowledge Distillation: Train smaller model from larger one
- Caching: Store intermediate activations for common patterns
Best Practices
- Data Preprocessing: Normalize and standardize inputs
- Train-Validation-Test Split: Typical 60-20-20 split
- Monitor Loss: Track both training and validation loss
- Regularization: Use dropout and L1/L2 to prevent overfitting
- Hyperparameter Search: Use grid or random search systematically
- Ensemble Methods: Combine multiple models for better performance
- Save Best Model: Checkpoint model during training
References
- LeCun, Y., Bengio, Y., & Hinton, G. (2015). "Deep Learning"
- Glorot, X., & Bengio, Y. (2010). "Understanding the difficulty of training deep feedforward neural networks"
- He, K., Zhang, X., Ren, S., & Sun, J. (2015). "Delving Deep into Rectifiers"
Future Implications
Near Term: Advances in neural architecture search automating model design
Long Term: More interpretable neural networks and integration with symbolic reasoning
Tags
Deep LearningNeural NetworksMachine Learning