Control & Stability Framework

01 — Quadrotor Dynamics

Translational & Rotational Equations

m =mg +R(q)f_T +f_wind
Jω̇ +ω× (Jω ) =τ
The quadrotor's motion is governed by non-linear rigid body dynamics. Translational motion is driven by collective thrust mapped through the attitude rotation matrix, heavily perturbed by stochastic wind. Rotational motion accounts for complex gyroscopic coupling effects inherent to 3D rotation, requiring precise torque actuation to maintain stability.

02 — State & Control Formulation

State Vectors and Actions

s_t = [p_t,v_t,q_t,ω_t,w_t ]
a_t = [T,τ_φ,τ_θ,τ_ψ ]
We employ full-state feedback. The agent receives absolute spatial coordinates, velocities, and attitude quaternions, alongside a latent representation of the turbulent wind state. It outputs continuous, bounded actions representing collective thrust and individual rotational torques.

03 — Optimization Engine

PPO Policy Structure

π_θ(a|s) = 𝒩( μ_θ(s), Σ_θ(s) )
The actor network outputs a Gaussian distribution for continuous exploration.
V_φ(s)
The critic network predicts expected future stabilization capability.
J(θ) = E[ Σ γR_t ]
L^CLIP(θ) = E[ min( r_t(θ)A_t , clip(...) ) ]
Proximal Policy Optimization maximizes long-term discounted rewards while binding updates with a trust region penalty, preventing catastrophic unlearning during volatile aerodynamic moments.

04 — Implicit CLF Design

Reward Function Design

R_t = α₁ ||ω||² α₂ ||e_p||² α₃ ||e_att||² α₄ ||a_t||²
E[R_t]
Var(||ω||)
“Training reduces instability and improves control smoothness.”

05 — Parameter Tracking

Learning Dynamics Over 600K Steps

θ_k+1 = θ_k + η ∇_θ L^CLIP
M(θ) = E[ ||ω||² + ||e_p||² ]
M(θ_600k) < M(θ_100k) < M(θ_0)
“Monotonic reduction in stabilization error over 600k gradient ascent steps.”

06 — Training Distribution

Domain Randomization

w_t ~ D_train
Calm
Smooth
Strong
Mixed
θ* = arg min_θ E_w[ M(θ) ]
“Policy optimized for minimal expected error across all disturbance regimes simultaneously.”

07 — System Validation

Final Performance Metrics

S_i = (1/T) Σ ( ||ω_t||² + ||e_p||² )
S_calm
S_smooth
S_strong
S_mixed
S_strong^600k < S_strong^early
E[R_i^600k] > E[R_i^100k]
Var(R_i) ↓
“Significantly improved worst-case performance and reduced reward variability.”

08 — Theoretical Guarantee

Lyapunov Stability Interpretation

e_p = pp_ref | e_ω = ω | e_att = [ φ ; θ ]
V(e) = ½ e_pᵀ e_p + ½ e_ωᵀ J e_ω + ½ e_attᵀ e_att
=e_ωᵀ K_ω e_ω e_pᵀ K_p e_p e_attᵀ K_att e_att + eᵀ f_wind
“PPO implicitly learns a nonlinear Lyapunov-stabilizing feedback controller.”
Made by Team Aero-Controllers