Interpolating Missing GPS Points with Kalman Filters

Interpolating missing GPS points with Kalman filters replaces naive straight-line assumptions with a recursive Bayesian estimator that models vehicle kinematics and sensor uncertainty. Unlike linear or cubic splines, a Kalman filter predicts the next spatial state (position and velocity), then corrects it when a new measurement arrives. This predict-update cycle runs continuously: during signal dropouts, only the prediction step executes, advancing the trajectory while explicitly inflating positional covariance. When a valid ping returns, the update step fuses prediction and observation, weighted by the relative confidence in the motion model versus receiver accuracy. The result is a statistically optimal, temporally continuous trajectory ready for downstream mobility analytics.

Why Kalman Filters Outperform Geometric Interpolation

Raw telemetry streams in urban mobility pipelines rarely arrive at fixed intervals. Multipath reflections, tunnel outages, and aggressive device power-saving modes create irregular gaps that break standard Temporal Aggregation & Window Mapping routines. Addressing these discontinuities through robust Gap Filling in Sparse Trajectories requires a state-space model that respects movement physics rather than geometric convenience.

Method	Assumption	Handles Noise?	Propagates Uncertainty?	Best For
Linear / Cubic Spline	Smooth, continuous path	❌ No	❌ No	Static mapping, visualization
Dead Reckoning	Constant heading/speed	️ Partial	❌ No	Short-term fallback
Kalman Filter	Noisy linear dynamics	✅ Yes	✅ Yes	Analytics, routing, ETA

State-Space Configuration for 2D Mobility

A 2D constant-velocity Kalman filter represents the system using five core components:

State vector (x): [lat, lon, v_lat, v_lon]ᵀ
State transition matrix (F): Propagates position using velocity over the elapsed time Δt
Observation matrix (H): Maps the full state to the measured dimensions (position only)
Process noise covariance (Q): Models unobserved dynamics like acceleration, braking, or turning
Measurement noise covariance (R): Represents GPS receiver variance (typically 4–25 m² for consumer-grade GNSS)

The filter executes two steps per timestamp. When data is present, it runs both prediction and correction. When a timestamp lacks a reading, it skips the correction step entirely. This asymmetry is what enables seamless interpolation without fabricating false confidence.

Recursive Update Cycle & Dropout Handling

The recursive update follows standard linear-Gaussian assumptions:

x̂ₖ⁻ = F x̂ₖ₋₁ Pₖ⁻ = F Pₖ₋₁ Fᵀ + Q

(If measurement zₖ exists) Kₖ = Pₖ⁻ Hᵀ (H Pₖ⁻ Hᵀ + R)⁻¹ x̂ₖ = x̂ₖ⁻ + Kₖ (zₖ − H x̂ₖ⁻) Pₖ = (I − Kₖ H) Pₖ⁻

During outages, zₖ is undefined. The algorithm bypasses the Kalman gain (Kₖ) and update equations, propagating only x̂ₖ⁻ and Pₖ⁻. Covariance P grows monotonically with Q, providing a mathematically sound uncertainty envelope that downstream routing or ETA models can consume.

Production-Ready Implementation

The following Python implementation uses filterpy for clarity while remaining fully compatible with standard scientific stacks. It handles irregular timestamps, skips missing measurements, and returns a dense trajectory aligned to the original index with explicit uncertainty bounds.

PYTHON

import numpy as np
from filterpy.kalman import KalmanFilter
from datetime import datetime

def interpolate_gps_kalman(
    timestamps: list[datetime],
    positions: list[tuple[float, float] | None],
    process_noise: float = 0.5,
    measurement_noise: float = 15.0
) -> dict[str, np.ndarray]:
    """
    Interpolates missing GPS points using a 2D constant-velocity Kalman filter.
    Handles irregular sampling and returns dense trajectories with uncertainty bounds.
    """
    n = len(timestamps)
    if n == 0:
        return {"lat": np.array([]), "lon": np.array([]), "uncertainty": np.array([])}

    # Initialize filter
    kf = KalmanFilter(dim_x=4, dim_z=2)
    kf.x = np.array([positions[0][0], positions[0][1], 0.0, 0.0])  # lat, lon, v_lat, v_lon
    kf.P *= 10.0  # Initial state uncertainty
    kf.R *= measurement_noise
    kf.Q *= process_noise
    kf.H = np.array([[1, 0, 0, 0],
                     [0, 1, 0, 0]])  # Map full state to position only

    out_lat = np.empty(n)
    out_lon = np.empty(n)
    out_unc = np.empty(n)

    for i in range(n):
        if i > 0:
            dt = (timestamps[i] - timestamps[i-1]).total_seconds()
            dt = max(dt, 0.1)  # Prevent degenerate matrices

            # Update transition matrix for variable dt
            kf.F = np.array([
                [1, 0, dt, 0],
                [0, 1, 0, dt],
                [0, 0, 1, 0],
                [0, 0, 0, 1]
            ])

        # Prediction step (always runs)
        kf.predict()

        # Update step (only if measurement exists)
        if positions[i] is not None:
            kf.update(np.array([positions[i][0], positions[i][1]]))

        # Store results
        out_lat[i] = kf.x[0]
        out_lon[i] = kf.x[1]
        out_unc[i] = np.sqrt(kf.P[0, 0] + kf.P[1, 1])  # Combined positional std dev

    return {"lat": out_lat, "lon": out_lon, "uncertainty": out_unc}

Tuning, Validation & Pipeline Integration

Tuning Q and R dictates the filter’s responsiveness. A low Q trusts the motion model heavily, smoothing out noise but lagging during sharp turns. A high Q tracks rapid maneuvers but amplifies measurement jitter. For urban mobility workloads, start with R derived from the receiver’s reported HDOP or CEP50 values, and scale Q empirically against known ground-truth routes.

When processing high-frequency logistics telemetry, consider upgrading from constant-velocity to a Constant Turn Rate and Velocity (CTRV) model. CTRV explicitly captures angular dynamics, reducing interpolation drift during highway merges or roundabouts. For a rigorous mathematical foundation on state estimation and noise covariance tuning, refer to the An Introduction to the Kalman Filter by Welch & Bishop, which remains the standard reference for linear-Gaussian systems.

Validate interpolated segments using root-mean-square error (RMSE) against held-out pings or by checking if uncertainty bounds remain within acceptable thresholds for your use case. In production pipelines, wrap the filter in a vectorized batch processor or compile it via numba to handle millions of device trajectories without Python loop overhead. The output uncertainty column should be fed directly into downstream confidence scoring, ensuring that analytics never treat a 30-second tunnel gap with the same weight as a 1-second high-accuracy ping.