Google Summer of Code 2025

Bayesian Dynamic Factor Models

👨‍💻 Andrea Catelli 🏛️ PyMC 📅 May 2025 - August 2025

Project Overview

This page contains weekly updates on my work developing Bayesian Dynamic Factor Models for the PyMC library as part of Google Summer of Code 2025.

Abstract

State-space models (SSMs) provide a flexible framework for modeling dynamic systems where latent states evolve over time. In econometrics, Dynamic Factor Models (DFMs) are widely used to capture the co-movement of multiple time series by assuming that a small number of latent factors drive the observed variables. The PyMC library already includes implementations of SARIMAX, VARMAX, and structural state-space models, along with example notebooks for their usage. This project aims to extend the existing PyMC state-space module by implementing Dynamic Factor Models, aligning with the functionality available in Statsmodels.

In Statsmodels, our reference for the development of the project, two DFM implementations exist: DynamicFactor, which represents the model in state-space form and estimates parameters via Kalman filtering, and DynamicFactorMQ, based on the Expectation-Maximization (EM) algorithm. Our implementation will follow a Bayesian approach by leveraging PyMC’s probabilistic programming framework and PyTensor for computation. Additionally, an accompanying example notebook will demonstrate estimation, forecasting, and causal analysis with the new model, ensuring accessibility for users with varying levels of experience in Bayesian modeling.

Code references:

Stage 0

May 8 – June 1
  • Participated in community bonding on the PyMC Discord channel and had an initial video call with my mentor.
  • Created this GitHub blog page for sharing progress during the coding period.
  • Began studying the 'Statistical Rethinking' course on YouTube as suggested by my mentor.

Week 1

June 2 – June 6
  • Reviewed examples available in the PyMC library to continue familiarizing myself with the package.
  • Continued following the 'Statistical Rethinking' course on YouTube.

Week 2

June 9 – June 13

Week 3

June 16 – June 20

Week 4

June 23 – June 27
  • Studied the Kalman Filter in detail by implementing it for a Univariate Structural Time Series model using pytensor, and compared the results with PyMC’s built-in version. The notebook is available here.

Week 5

June 30 – July 4
  • Improved and extended the comparison between PyMC and Statsmodels implementations of the coincident index, focusing on model outputs, inference behavior, and model structure. In particular, I added a lagged dependence of the factor on one observed variable to better align with the Statsmodels extended model version. The updated notebook is available here.

Week 6

July 7 – July 11
  • Delivered a working DFM.py module implementing the Dynamic Factor Model, now available in this pull request. The implementation is functional and ready for review, with further testing and matrix-vectorization optimizations planned.

Week 7

July 14 – July 18
  • Optimized the DFM.py implementation through vectorization and block diagonal construction. Added support for measurement errors and heterogeneous autoregressive orders across factors.

Week 8

July 21 – July 25
  • Began implementing pytest tests for the new BayesianDynamicFactor class. Focused on validating the correct construction of model matrices by comparing them against Statsmodels, and initiated a test for log-likelihood computation.

Week 9

July 28 – August 1
  • Adjustments and refinements in the DFM implementation, mainly in the ordering of the state vector and the construction of the state-space matrices, to match the Statsmodels implementation (also in preparation for the tests).

Week 10

August 4 – August 8
  • Completion of test implementations to verify the correctness of matrix construction by comparing with Statsmodels.

Week 11

August 11 – August 15
  • Implemented support for exogenous variables by extending the state, following pymc_extras/statespace/models/structural/components/regression.py. This approach differs from Statsmodels but provides greater flexibility. Added corresponding tests, including comparisons with Statsmodels and internal validations.

Week 12

August 18 – August 22
  • Completed a notebook example demonstrating the Dynamic Factor Model (DFM) in PyMC, and compared its performance with Statsmodels on the construction of the coincident index, a standard benchmark for DFMs.

Week 13

August 25 – August 29
  • Performed final code refinements and corrections before merging into the main pymc-extras repository.

Theory Resources: