Skip to article content

Self-orthogonalizing attractor neural networks emerging from the free energy principle

Appendix

Library imports for inline code.

import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
import scipy.integrate as integrate

sns.set(style="white")

Appendix 1

Continuous Bernoulli distribution

xCB(L)    P(x)eLx,x[1,1]Rx \sim \mathcal{CB}(L) \iff P(x) \propto e^{Lx}, \quad x \in [-1, 1]\subset \mathbb{R}

Below we provide a Python implementation of the continuous Bernoulli distributions: CB(L)\mathcal{CB}(L), parametrized by the log odds L and adjusted to the [-1,1] interval.

def CB(x, b, logodds=True):
    if not logodds:
        b=np.log(b/(1-b))
    if np.isclose(b, 0):
        return np.ones_like(x)/2
    else:
        return b * np.exp(b*x) / (2*np.sinh(b)) 
    
# Plot some examples
delta = 100
sigma = np.linspace(-1, 1, delta)
for b in np.linspace(-2, 2, 5):
    p_sigma = CB(sigma, b)
    sns.lineplot(x=sigma, y=p_sigma, label=b)
<Figure size 640x480 with 1 Axes>

Appendix 2

Continuous Bernoulli distribution full derivation

Derivation of the exponential form of the continuous Bernoulli distribution, parametrized by the log odds L and adjusted to the [-1,1] interval.

P(x;L)=eLx11eLx=eLxeLeLL=LeLx2sinh(L)P(x; L) = \frac{e^{Lx}}{\int_{-1}^1 e^{Lx}} = \frac{e^{Lx}}{ \frac{e^L - e^{-L}}{L}} = L \frac{e^{Lx}}{2sinh(L)}

Appendix 3

Visualization of the likelihood function with various parameters, in python.

delta = 100
s_i = np.linspace(-1, 1, delta)

fig, axes = plt.subplots(1, 3, figsize=(12, 3), sharey=True)
for idx, mu in enumerate([0.1, 0.5, 1]):
    for w_i in np.linspace(-2, 2, 5):
            p_mu = CB(s_i, w_i*mu)
            sns.lineplot(x=s_i, y=p_mu, ax=axes[idx], label=w_i).set(title=f"$\\mu={mu}$")
<Figure size 1200x300 with 3 Axes>

Appendix 4

Expected value of the CB\mathcal{CB}

ECB(b)[σ]=σebσ2sinh(b)dσ=b((b1)ebb2+(b+1)ebb2)2sinh(b)=coth(b)1b\begin{align*} \mathbb{E}_{\mathcal{CB}(b)}[\sigma] &= \int \sigma \frac{ e^{b \sigma} }{ 2sinh(b) } d \sigma &= \frac{ b \left( \frac{(b-1)e^b}{b^2} + \frac{(b+1)e^-b}{b^2} \right) }{ 2sinh(b) } \\ &= \coth(b) - \frac{1}{b} \end{align*}
bs = np.linspace(-10, 10, 100)
plt.figure(figsize=(4, 1))
sns.lineplot(x=bs, y=1/np.tanh(bs) - 1/bs) # coth(b) - 1/b == 1/2 sech^2(b)
plt.show()
<Figure size 400x100 with 1 Axes>

Appendix 5

Conservative dynamics

Let xRnx ∈ ℝ^n denote the system’s states (internal, blanket, and external states). The drift can be decomposed into a gradient part (from a potential U) and a solenoidal part R:

x˙=U(x)+R(x)ẋ = −∇U(x) + R(x)

where R=RTR = −Rᵀ is antisymmetric in state‐space.

The probability density p(x,t)p(x, t) over states evolves according to the Fokker–Planck equation:

tp(x,t)=[(U+R)p(x,t)]+diffusionterms.∂ₜp(x,t) = −∇ · [ (−∇U + R) p(x,t) ] + diffusion terms.

To determine the stationary distribution ps(x)p_s(x), we set tp(x,t)=0∂ₜp(x,t) = 0. The Fokker-Planck equation then implies that the net probability current Js(x)=(U(x)+R(x))ps(x)Dps(x)J_s(x) = (−∇U(x) + R(x)) p_s(x) - D∇p_s(x) (assuming isotropic diffusion DD) must be divergence-free: Js(x)=0∇ \cdot J_s(x) = 0. If we propose ps(x)=Z1exp(U(x)/D)p_s(x) = Z^{-1} \exp(−U(x)/D) (where ZZ is a normalization constant), then the diffusive current Dps(x)-D∇p_s(x) becomes ps(x)U(x)p_s(x)∇U(x).

Substituting this into Js(x)J_s(x), we get:

Js(x)=(U(x)+R(x))ps(x)+ps(x)U(x)=R(x)ps(x)J_s(x) = (−∇U(x) + R(x)) p_s(x) + p_s(x)∇U(x) = R(x) p_s(x).

Thus, ps(x)exp(U(x)/D)p_s(x) \propto \exp(−U(x)/D) is the stationary distribution if and only if the solenoidal component of the probability current, JR(x)=R(x)ps(x)J_R(x) = R(x) p_s(x), is itself divergence-free: (R(x)ps(x))=0∇ \cdot (R(x) p_s(x)) = 0.

It is a key result from the study of non-equilibrium systems under the Free Energy Principle, particularly those possessing a Markov blanket and involving conservative particles, that this condition (R(x)ps(x))=0∇ \cdot (R(x) p_s(x)) = 0 holds (see Friston et al., 2023; but also related: Ao (2004); Xing (2010)). The conditional independence structure imposed by the Markov blanket, along with other FEP-specific assumptions (like conservative particles), constrains the system’s dynamics such that solenoidal forces R(x)R(x) do not alter the Boltzmann form of the stationary distribution. Instead, they drive persistent, divergence-free probability currents JR(x)J_R(x) along iso-potential surfaces.

In steady state (tp=0)(∂ₜp = 0), only the gradient term U−∇U influences p(x)p(x). The stationary (nonequilibrium) distribution is

p(x)expU(x) p(x) ∝ exp{ −U(x) },

determined solely by the symmetric (potential) part −∇U. The antisymmetric RR does not reweight p(x)p(x); it just circulates probability around isocontours of UU.

In most NESS systems, antisymmetric flows do alter the stationary measure. Under particular‐partition (Markov‐blanket) constraints, however, the internal–external factorization ensures that solenoidal (antisymmetric) flows remain divergence‐free, leaving the Boltzmann‐like steady distribution intact. This is why, in these Markov‐blanketed systems, one can have persistent solenoidal currents (nonequilibrium flows) yet preserve a stationary distribution that depends only on the symmetric part of the couplings.

Appendix 6

Detailed derivation of Fbq\dfrac{\partial F}{\partial b_q}

1. Subsitute our parametrization into F

Let’s start with substituting our parametrization into eq. .

1a. Accuracy term

From the RBM marginalization (eq. ):

E(σ)=biσijiJijσiσjTerms with σijibjσj12j,kiJjkσjσkTerms without σi E(\bm{\sigma}) = \underbrace{-b_i\sigma_i - \sum_{j\neq i}J_{ij}\sigma_i\sigma_j}_{\text{Terms with } \sigma_i} \underbrace{-\sum_{j\neq i}b_j\sigma_j - \frac{1}{2}\sum_{j,k\neq i}J_{jk}\sigma_j\sigma_k}_{\text{Terms without } \sigma_i}

Here, biσi-b_i\sigma_i becomes constant, since σi\sigma_i is fixed. So we get:

P(σ\iσi)exp(ji(bj+Jijσi)σj+12j,kiJjkσjσk) P(\sigma_{\backslash i}|\sigma_i) \propto \exp\left(\sum_{j\neq i}(b_j + J_{ij}\sigma_i)\sigma_j + \frac{1}{2}\sum_{j,k\neq i}J_{jk}\sigma_j\sigma_k\right)

Taking expectation of lnP(σ\iσi)\ln P(\sigma_{\backslash i}|\sigma_i) under q(σi)q(σ_i):

Eq[lnP(σ\iσi)]=const+jibjσj+S(bq)jiJijσj+12j,kiJjkσjσk \mathbb{E}q[\ln P(σ_{\backslash i}|σ_i)] = \text{const} + \sum_{j\neq i}b_jσ_j + S(b_q)\sum_{j\neq i}J_{ij}σ_j + \frac{1}{2}\sum_{j,k\neq i}J_{jk}σ_jσ_k

Where S(bq)=Eq[σi]=cothbq1/bqS(b_q) = \mathbb{E}_q[σ_i] = \coth b_q - 1/b_q is the expected value of the CB\mathcal{CB}, a sigmoid function of the bias (#supplementary-information-4)).

1b. Complexity term

The complexity term in eq. is simply the KL-divergence term between two CB\mathcal{CB} distributions. For CB\mathcal{CB} distributions:

q(x)=bq2sinhbqebqx,p(x)=b2sinhbebxq(x) = \frac{b_q}{2\sinh b_q}e^{b_q x},\quad p(x) = \frac{b}{2\sinh b}e^{b x}

• KL divergence definition:

DKL=11q(x)lnq(x)p(x)dx=Eq[lnq(x)lnp(x)] D_{KL} = \int_{-1}^1 q(x) \ln\frac{q(x)}{p(x)} dx = \mathbb{E}_q[\ln q(x) - \ln p(x)]

• Expand log terms:

lnq(x)=lnbqln(2sinhbq)+bqx \ln q(x) = \ln b_q - \ln(2\sinh b_q) + b_q x

lnp(x)=lnbln(2sinhb)+bx \ln p(x) = \ln b - \ln(2\sinh b) + b x

• Subtract log terms:

lnq(x)p(x)=lnbqb+lnsinhbsinhbq+(bqb)x \ln\frac{q(x)}{p(x)} = \ln\frac{b_q}{b} + \ln\frac{\sinh b}{\sinh b_q} + (b_q - b)x

• Take expectation under q(x):

DKL=lnbqb+lnsinhbsinhbq+(bqb)Eq[x] D_{KL} = \ln\frac{b_q}{b} + \ln\frac{\sinh b}{\sinh b_q} + (b_q - b)\mathbb{E}_q[x]

• Compute expectation Eq[x]\mathbb{E}_q[x]:

Eq[x]=11xbqebqx2sinhbqdx=12sinhbq[ebqxbq2(bqx1)]11 \mathbb{E}_q[x] = \int_{-1}^1 x \frac{b_q e^{b_q x}}{2\sinh b_q} dx = \frac{1}{2\sinh b_q}\left[\frac{e^{b_q x}}{b_q^2}(b_q x - 1)\right]_{-1}^1

• Evaluate at bounds:

=12sinhbq(ebq(bq1)ebq(bq1)bq2) = \frac{1}{2\sinh b_q}\left(\frac{e^{b_q}(b_q - 1) - e^{-b_q}(-b_q - 1)}{b_q^2}\right)

• Simplify using hyperbolic identities:

=(bqcoshbqsinhbq)bq2sinhbq=cothbq1bq = \frac{(b_q \cosh b_q - \sinh b_q)}{b_q^2 \sinh b_q} = \coth b_q - \frac{1}{b_q}

• Final substitution for the complexity term:

DKL=lnbqsinhbbsinhbq+(bqb)(cothbq1bq) D_{KL} = \ln\frac{b_q \sinh b}{b \sinh b_q} + (b_q - b)\left(\coth b_q - \frac{1}{b_q}\right)

1c. Combining the two terms

Combining the two terms, we get the following expression for the free energy:

F=ln(bqb)+ln(sinh(b)sinh(bq))+(bqb)S(bq)ji(bj+S(bq)Jij)σj12jikiJjkσjσk+C F = \ln\left(\frac{b_q}{b}\right) + \ln\left(\frac{\sinh(b)}{\sinh(b_q)}\right) + (b_q - b) S(b_q) - \sum_{j \ne i} \left( b_j + S(b_q) J_{ij} \right) \sigma_j - \dfrac{1}{2} \sum_{j \ne i} \sum_{k \ne i} J_{jk} \sigma_j \sigma_k + C

where C denotes all constants in the equation that are independent of σorbq\sigma or b_q.

2. Free Energy partial derivative calculation

• First, we differentiate the log terms:

bq[lnbqb+lnsinhbsinhbq]=1bqcothbq \frac{\partial}{\partial b_q}\left[\ln\frac{b_q}{b} + \ln\frac{\sinh b}{\sinh b_q}\right] = \frac{1}{b_q} - \coth b_q

• Then, the KL core term:

bq[(bqb)S(bq)]=S(bq)+(bqb)dSdbq \frac{\partial}{\partial b_q}\left[(b_q - b)S(b_q)\right] = S(b_q) + (b_q - b)\frac{dS}{db_q}

• The linear terms:

bq[ji(bj+S(bq)Jij)σj]=jiJijσjdSdbq \frac{\partial}{\partial b_q}\left[-\sum_{j≠i}(b_j + S(b_q)J_{ij})\sigma_j\right] = -\sum_{j≠i}J_{ij}\sigma_j\frac{dS}{db_q}

• The constants vanish.

• Now, combining all terms:

Fbq=(1bqcothbq)+(S(bq)+(bqb)dSdbq)jiJijσjdSdbq \frac{\partial F}{\partial b_q} = \left(\frac{1}{b_q} - \coth b_q\right) + \left(S(b_q) + (b_q - b)\frac{dS}{db_q}\right) - \sum_{j≠i}J_{ij}\sigma_j\frac{dS}{db_q}

• Substituting S(bq)=cothbq1/bqS(b_q) = \coth b_q - 1/b_q:

=(1bqcothbq)+(cothbq1bq+(bqb)dSdbq)jiJijσjdSdbq = \left(\frac{1}{b_q} - \coth b_q\right) + \left(\coth b_q - \frac{1}{b_q} + (b_q - b)\frac{dS}{db_q}\right) - \sum_{j≠i}J_{ij}\sigma_j\frac{dS}{db_q}

• Cancel terms:

1bqcothbq+cothbq1bq+(bqb)dSdbqjiJijσjdSdbq \cancel{\frac{1}{b_q}} \cancel{- \coth b_q} + \cancel{\coth b_q} \cancel{- \frac{1}{b_q}} + (b_q - b)\frac{dS}{db_q} - \sum_{j≠i}J_{ij}\sigma_j\frac{dS}{db_q}

Gives us the final derivative:

Fbq=(bqbjiJijσj)dSdbq \frac{∂F}{∂b_q} = \left(b_q - b - \sum_{j\neq i}J_{ij}σ_j\right)\frac{dS}{db_q}

Where dSdbq=csch2bq+1bq2\frac{dS}{db_q} = -csch^2 b_q + \frac{1}{b_q^2}.

Setting the derivative to zero and solving for bqb_q, we get: bq=b+jiJijσj b_q = b + \sum_{j \ne i} J_{ij} \sigma_j

Now we remember that the expected value of the CB\mathcal{CB} is the Langevin function of its bias -, E(x)=coth(b)1/b\mathbb{E}(x) = coth(b) - 1/b (). For simplicity, we will denote it as L(x)L(x). Now we can write:
Eq[σi]=L(bq)=L(b+jiJijσj) \mathbb{E}_{q}[\sigma_i] = L(b_q) = L \left( b + \sum_{j \ne i} J_{ij} \sigma_j \right)

Q.E.D.

References
  1. Friston, K., Da Costa, L., Sakthivadivel, D. A. R., Heins, C., Pavliotis, G. A., Ramstead, M., & Parr, T. (2023). Path integrals, particular kinds, and strange things. Physics of Life Reviews, 47, 35–62. 10.1016/j.plrev.2023.08.016
  2. Ao, P. (2004). Potential in stochastic differential equations: novel construction. Journal of Physics A: Mathematical and General, 37(3), L25–L30. 10.1088/0305-4470/37/3/l01
  3. Xing, J. (2010). Mapping between dissipative and Hamiltonian systems. Journal of Physics A: Mathematical and Theoretical, 43(37), 375003. 10.1088/1751-8113/43/37/375003
Attractor networks and active inference
Attractor networks and active inference
Attractor networks and active inference
Simulation 1