
Abstract
This report surveys the evolution of deep learning in quantitative trading, from traditional econometric models to sophisticated neural architectures. We analyze MLPs, RNNs, LSTMs, CNNs, Autoencoders, DRL, GNNs, and Transformers—examining their unique properties, trading applications, and critical limitations in high-noise, non-stationary financial markets.
1. From Linear Econometrics to Non-Linear Machine Learning
Econometric Models
ARIMA, GARCH
✓ Interpretable, statistical rigor
✗ Linear assumptions fail
Classical ML
SVM, Random Forests
✓ Non-linear, feature importance
✗ No temporal awareness
MLPs
Multi-Layer Perceptrons
✓ Universal approximator
✗ Treats time as "bag of features"
Key Insight:
MLPs solve non-linearity but ignore sequential order—critical flaw for time-series data.
2. Modeling Time: Recurrent Architectures
RNNs
Innovation: Hidden state "memory"
Problem: Vanishing gradients
Memory limited to few time steps
LSTMs
Innovation: Gating mechanism (input, output, forget gates)
Advantage: Long-term dependencies
Dominant 2010s architecture
LSTM Limitations in Finance:
- • Non-Stationarity: Memory becomes "obsolete" during regime shifts
- • Overfitting: Memorizes noise in low SNR markets
- • Sequential Bottleneck: Cannot parallelize training
3. Novel Data Representations
CNNs: Market-as-Image
Applications:
- 1. Chart pattern recognition (candlestick images)
- 2. "Factor pictures" (100 factors × 60 days as 2D image)
✗ Arbitrary representation, black box
Autoencoders: Non-Linear PCA
Function: Encoder compresses → Latent space → Decoder reconstructs
Use: Feature extraction & denoising for downstream models
✓ Unsupervised learning, signal extraction
4. The New Frontier: Systems & Agents
Deep Reinforcement Learning
Paradigm: Market-as-Game
Agent learns policy to maximize reward (PnL/Sharpe)
Critical Barrier: Sim-to-Real Gap
Requires perfect market simulator—unrealistic
Graph Neural Networks
Paradigm: Market-as-System
Models relationships: Nodes = assets, Edges = dependencies
Killer App: Systemic Risk
Contagion modeling, relational alpha
5. Current Apex: Transformers
Core Innovation: Self-Attention Mechanism
Direct access to all past time steps simultaneously—learns which events matter regardless of distance
✓ Parallelizable (solves LSTM bottleneck)
Application 1: Time-Series
Temporal Fusion Transformer (TFT)
State-of-the-art, interpretable attention
Application 2: NLP Revolution
FinBERT for sentiment analysis
Unlocks alternative data (news, social media)
Advantages
- • Superior long-range modeling
- • Parallelizable training
- • SOTA on NLP tasks
Limitations
- • Extreme computational cost
- • Black box (regulatory issues)
- • Overfitting risk
6. Roadmap: Becoming a Deep Learning Quant
Domain 1: Quant Finance
- • Probability, Statistics, Linear Algebra
- • ARIMA, GARCH, Cointegration
- • Portfolio Theory, Risk Management
Domain 2: ML/CS
- • Python mastery (C++ for HFT)
- • scikit-learn: RF, SVM, PCA
- • PyTorch/TensorFlow: MLPs, LSTMs
Foundations
Math, finance theory, understand alpha & risk
Toolkit
Pandas, NumPy, scikit-learn, backtesting
Core DL
Implement MLP & LSTM, compare to ARIMA
Specialization
Choose: NLP (FinBERT), DRL (DQN), or GNNs
Model Evolution Summary
| Model | Key Feature | Advantage | Limitation |
|---|---|---|---|
| ARIMA/GARCH | Linear models | Interpretable | No non-linearity |
| SVM/RF | Non-linear ML | Feature importance | No temporal awareness |
| MLP | Universal approximator | Models any function | Ignores sequence order |
| RNN | Hidden state memory | Sequential processing | Vanishing gradients |
| LSTM | Gating mechanism | Long-term memory | Sequential bottleneck |
| CNN | Spatial patterns | Factor interactions | Arbitrary representation |
| Autoencoder | Latent compression | Unsupervised denoising | Intermediate step only |
| DRL | Policy learning | Action-oriented | Sim-to-real gap |
| GNN | Graph relationships | Systemic modeling | Graph construction |
| Transformer | Self-attention | Parallelizable, NLP | Computational cost |
DL Frameworks Comparison
| Framework | Philosophy | Ease of Use | Production | Finance Adoption |
|---|---|---|---|---|
| TensorFlow | Production-first | Steeper curve | Excellent (TFX) | Widespread |
| PyTorch | Research-first | Intuitive | Good (TorchServe) | Very high |
| JAX | High-performance | High curve | Emerging | Niche (HPC) |