Skip to article frontmatterSkip to article content

Supplementary Figures

Eigenstructure and projector tests of the fcANN 
A: Eigenvalue spectra of the empirical coupling matrix J (left) and null model 1 (coupling matrix based on phase randomized timeseries data, recalculated for each permutation) (right). B: Eigenvector–attractor alignment calculated from the empirical (left) and phase-randomized data (right). Attractors were obtained by deterministic relaxation from random initial states (with collapsing sign‑duplicates); alignment is the absolute cosine between collapsed attractor vectors and the top eigenvectors of J. C: Weight correspondence between J and its Kanter–Sompolinsky (K–S) analog. From the measured attractors we formed Σ (columns are attractors) and C=ΣᵀΣ/N, then computed the pseudo‑inverse projector J_{KS}=(1/N)ΣC⁻¹Σᵀ. Similarity was quantified as the cosine between the off‑diagonal elements of J and J_{KS}. The gray histogram shows the null distribution from null model 2 (symmetry‑preserving permutations of J, but see Source notebook for similar results with null model 1, i.e. phase randomized timeseries data); for each of the 1000 permutations p we recomputed the own attractors of the surrogate network and J_{KS}^{(p)}. The red dashed line marks the empirical value; the one‑sided p‑value is the fraction of null cosines ≥ the empirical cosine. The empirical network shows stronger eigenvector–attractor alignment and substantially higher J↔J_KS off‑diagonal correspondence than the null, consistent with approximate K–S projector behavior.

Figure 1:Eigenstructure and projector tests of the fcANN
A: Eigenvalue spectra of the empirical coupling matrix J (left) and null model 1 (coupling matrix based on phase randomized timeseries data, recalculated for each permutation) (right). B: Eigenvector–attractor alignment calculated from the empirical (left) and phase-randomized data (right). Attractors were obtained by deterministic relaxation from random initial states (with collapsing sign‑duplicates); alignment is the absolute cosine between collapsed attractor vectors and the top eigenvectors of J. C: Weight correspondence between J and its Kanter–Sompolinsky (K–S) analog. From the measured attractors we formed Σ (columns are attractors) and C=ΣᵀΣ/N, then computed the pseudo‑inverse projector JKS=(1/N)ΣC1ΣTJ_{KS}=(1/N)ΣC⁻¹Σᵀ. Similarity was quantified as the cosine between the off‑diagonal elements of J and JKSJ_{KS}. The gray histogram shows the null distribution from null model 2 (symmetry‑preserving permutations of J, but see Source notebook for similar results with null model 1, i.e. phase randomized timeseries data); for each of the 1000 permutations pp we recomputed the own attractors of the surrogate network and JKS(p)J_{KS}^{(p)}. The red dashed line marks the empirical value; the one‑sided p‑value is the fraction of null cosines ≥ the empirical cosine. The empirical network shows stronger eigenvector–attractor alignment and substantially higher J↔J_KS off‑diagonal correspondence than the null, consistent with approximate K–S projector behavior.

Schematic representation of the fcANN projection.
The fcANN projection is a 2-dimensional visualization of the fcANN dynamics, based on the first two principal components (PCs) of the states sampled from the stochastic relaxation procedure. The first two PCs yield a clear separation of the attractor states, with the two symmetric pairs of attractor states located at the extremes of the first and second PC.
To map the attractors’ basins on the space spanned by the first two PCs, we obtained the attractor state of each point visited during the stochastic relaxation and fit a multinomial logistic regression model to predict the attractor state from the first two PCs.
The resulting model accurately predicted attractor states of arbitrary brain activity patterns, achieving a cross-validated accuracy of 96.5% (permutation-based p<0.001).
The attractor basins were visualized by using the decision boundaries obtained from this model. We propose the 2-dimensional fcANN projection as a simplified visual representation of brain dynamics, and use it as a basis for all subsequent analyses in this work.

Figure 2:Schematic representation of the fcANN projection. The fcANN projection is a 2-dimensional visualization of the fcANN dynamics, based on the first two principal components (PCs) of the states sampled from the stochastic relaxation procedure. The first two PCs yield a clear separation of the attractor states, with the two symmetric pairs of attractor states located at the extremes of the first and second PC. To map the attractors’ basins on the space spanned by the first two PCs, we obtained the attractor state of each point visited during the stochastic relaxation and fit a multinomial logistic regression model to predict the attractor state from the first two PCs. The resulting model accurately predicted attractor states of arbitrary brain activity patterns, achieving a cross-validated accuracy of 96.5% (permutation-based p<0.001). The attractor basins were visualized by using the decision boundaries obtained from this model. We propose the 2-dimensional fcANN projection as a simplified visual representation of brain dynamics, and use it as a basis for all subsequent analyses in this work.

Explained variance in state energy by first two principal components. See supplemental_material.ipynb for details.

Figure 3:Explained variance in state energy by first two principal components. See supplemental_material.ipynb for details.

Cross-validation classification accuracy of the fcANN, when predicting the attractor state from state
activation.  See supplemental_material.ipynb for details.

Figure 4:Cross-validation classification accuracy of the fcANN, when predicting the attractor state from state activation. See supplemental_material.ipynb for details.

Parameter sweep of fcANN parameters threshold and beta. The number of attractor states is color‑coded. See supplemental_material.ipynb for details.

Figure 5:Parameter sweep of fcANN parameters threshold and beta. The number of attractor states is color‑coded. See supplemental_material.ipynb for details.

fcANNs initialized with the empirical connectome have better convergence properties than permutation‑based null models. We investigated the convergence properties of functional connectome‑based ANNs in study 1 by contrasting the number of iterations until reaching convergence to a permutation‑based null model. In more detail, the null model was constructed by randomly permuting the upper triangle of the original connectome and filling up the lower triangle to get a symmetric network (symmetry of the weight matrix is a general requirement for convergence). This procedure was repeated 1000 times. In each repetition, we initialized both the original and the permuted fcANN with the same random input and counted the number of iterations until convergence. Each point on the plot shows an iteration number; the lines connect iteration numbers corresponding to the original and permuted matrices initialized with the same input. Statistical significance of the faster convergence in the empirical connectome was assessed via a one‑sided Wilcoxon signed‑rank test (i.e., a non‑parametric paired test) on the paired iteration values (1,000 pairs), with the null hypothesis that the empirical connectome converges in fewer iterations than the permuted connectome. The whole procedure was repeated with \beta=0.3, 0.35, 0.4, 0.5 and 0.6 (providing 2–8 attractor states). See convergence-analysis.ipynb for details.

Figure 6:fcANNs initialized with the empirical connectome have better convergence properties than permutation‑based null models. We investigated the convergence properties of functional connectome‑based ANNs in study 1 by contrasting the number of iterations until reaching convergence to a permutation‑based null model. In more detail, the null model was constructed by randomly permuting the upper triangle of the original connectome and filling up the lower triangle to get a symmetric network (symmetry of the weight matrix is a general requirement for convergence). This procedure was repeated 1000 times. In each repetition, we initialized both the original and the permuted fcANN with the same random input and counted the number of iterations until convergence. Each point on the plot shows an iteration number; the lines connect iteration numbers corresponding to the original and permuted matrices initialized with the same input. Statistical significance of the faster convergence in the empirical connectome was assessed via a one‑sided Wilcoxon signed‑rank test (i.e., a non‑parametric paired test) on the paired iteration values (1,000 pairs), with the null hypothesis that the empirical connectome converges in fewer iterations than the permuted connectome. The whole procedure was repeated with β=0.3,0.35,0.4,0.5\beta=0.3, 0.35, 0.4, 0.5 and 0.6 (providing 2–8 attractor states). See convergence-analysis.ipynb for details.

Statistical inference of the fcANN state occupancy prediction with different null models.
A Results with a spatial autocorrelation-preserving null model for the empirical activity patterns. See null_models.ipynb for more details.
B Results where simulated samples are randomly sampled from a multivariate normal distribution, with the functional connectome as the covariance matrix, and compared to the fcANN performance. See supplemental_material.ipynb for details.

Figure 7:Statistical inference of the fcANN state occupancy prediction with different null models. A Results with a spatial autocorrelation-preserving null model for the empirical activity patterns. See null_models.ipynb for more details. B Results where simulated samples are randomly sampled from a multivariate normal distribution, with the functional connectome as the covariance matrix, and compared to the fcANN performance. See supplemental_material.ipynb for details.

FcANN can reconstruct the pain “ghost attractor”.
Signal-to-noise values range from 0.003 to 0.009. Asterisk denotes the location of the simulated “ghost attractor”. P-values are based on permutation testing, by randomly changing the conditions in a per-participant basis. See main_analyses.ipynb for more details.

Figure 8:FcANN can reconstruct the pain “ghost attractor”. Signal-to-noise values range from 0.003 to 0.009. Asterisk denotes the location of the simulated “ghost attractor”. P-values are based on permutation testing, by randomly changing the conditions in a per-participant basis. See main_analyses.ipynb for more details.

fcANN can reconstruct the changes in brain dynamics caused by the voluntary downregulation of pain (as contrasted to upregulation)
Signal-to-noise values range from 0.001 to 0.005. P-values are based on permutation testing, by randomly changing the conditions in a per-participant basis. See main_analyses.ipynb for more details.

Figure 9:fcANN can reconstruct the changes in brain dynamics caused by the voluntary downregulation of pain (as contrasted to upregulation) Signal-to-noise values range from 0.001 to 0.005. P-values are based on permutation testing, by randomly changing the conditions in a per-participant basis. See main_analyses.ipynb for more details.

Robustness of the fcANN weights to noise.
We set the temperature of the fcANN, so that two attractor states emerge and iteratively add noise to the connectome.
To account for the change in dynamics, we adjust the temperature (beta) of the noisy fcANN so that exactly two states emerge. We then highlight the decrease in nodal strength of the noisy connectome (the fcANN weights) as a reference metric
vs the correlation of the attractor states that emerge from the noisy connectome. See supplemental_material.ipynb for details.

Figure 10:Robustness of the fcANN weights to noise. We set the temperature of the fcANN, so that two attractor states emerge and iteratively add noise to the connectome. To account for the change in dynamics, we adjust the temperature (beta) of the noisy fcANN so that exactly two states emerge. We then highlight the decrease in nodal strength of the noisy connectome (the fcANN weights) as a reference metric vs the correlation of the attractor states that emerge from the noisy connectome. See supplemental_material.ipynb for details.

All significant differences of the mean state activation analysis on the ABIDE dataset; label denotes the region
in the BASC122 atlas. See supplemental_material.ipynb for details.

Figure 11:All significant differences of the mean state activation analysis on the ABIDE dataset; label denotes the region in the BASC122 atlas. See supplemental_material.ipynb for details.

Supplementary Tables

Table 1:MRI acquisition parameters. TR: repetition time; TE: echo time; FA: flip angle; FOV: field of view; EPI: echo‑planar imaging; SPGR: spoiled gradient recall; SENSE/GRAPPA/ASSET: parallel imaging factors. Study 5-7 are metaanalyses or multi-center studies with varying data. Sequence parameters for these studies are available in the respective publications.

Parameter

Study 1

Study 2

Study 3

Study 4

Scanner / head coil

Philips Achieva X 3T; 32‑ch

Siemens Magnetom Skyra 3T; 32‑ch

GE Discovery MR750w 3T; 20‑ch

Philips Achieva TX 3T; head coil per site

Anatomical sequence

T1 MPRAGE

T1 MPRAGE

T1 3D IR‑FSPGR

T1 SPGR (high‑resolution)

Anatomical TR / TE

8500 ms / 3.9 ms

2300 ms / 2.07 ms

5.3 ms / 2.1 ms

— / —

Anatomical resolution / FOV

1×1×1 mm³; 256×256×220 mm³

1×1×1 mm³; 256×256×192 mm³

1×1×1 mm³; 256×256×172

Resting‑state EPI TR / TE / FA

2500 ms / 35 ms / 90°

2520 ms / 35 ms / 90°

2500 ms / 27 ms / 81°

2000 ms / 20 ms / —

Phase enc.

COL

A>>P

A>>P

FOV (voxels × slices)

240×240×132; 40 slices

230×230×132; 38 slices

96×96×44; 44 slices

64×64; 42 slices

Slice thickness / gap / order

3 mm / 0.3 mm / interleaved

3 mm / 0.48 mm / interleaved

3 mm / 0 mm / interleaved

3 mm / — / interleaved

Acceleration / fat suppression

SENSE 3× / SPIR

GRAPPA 2× / Fat sat.

ASSET 2× / Fat sat.

SENSE 1.5× / —

Volumes / dummies / scan time

200 / 5 / 8 min 37 s

290 / 5 / 12 min 11 s

240 / 0 / 10 min

— / — / —

Table 2:Neurosynth meta-analyses. The table includes details about the term used for the automated meta-analyses, as well as the number of studies included in the meta-analysis, the total number of reported activations and the maximal Z-statistic from the meta-analysis.

search term

num. studies

num. activations

max. Z

pain

516

23295

14.8

motor

2565

109491

22.5

auditory

1252

46557

25.3

visual

3110

115726

15.4

face

896

31842

26.8

autobiographical

143

7251

15.7

theory of mind

181

7761

15.1

sentences

356

13461

16.5

Supplementary Methods

Study 4 instructions for upregulation. “During this scan, we are going to ask you to try to imagine as hard as you can that the thermal stimulations are more painful than they are. Try to focus on how unpleasant the pain is, for instance, how strongly you would like to remove your arm from it. Pay attention to the burning, stinging and shooting sensations. You can use your mind to turn up the dial of the pain, much like turning up the volume dial on a stereo. As you feel the pain rise in intensity, imagine it rising faster and faster and going higher and higher. Picture your skin being held up against a glowing hot metal or fire. Think of how disturbing it is to be burned, and visualize your skin sizzling, melting and bubbling as a result of the intense heat.”

Study 4 instructions for downregulation. “During this scan, we are going to ask you to try to imagine as hard as you can that the thermal stimulations are less painful than they are. Focus on the part of the sensation that is pleasantly warm, like a blanket on a cold day. You can use your mind to turn down the dial of your pain sensation, much like turning down the volume dial on a stereo. As you feel the stimulation rise, let it numb your arm, so any pain you feel simply fades away. Imagine your skin is very cool, from being outside, and think of how good the stimulation feels as it warms you up.”

Supplementary Information 1

Derivation of the joint steady-state probability of free energy minimizing attractor networks See Spisak & Friston, 2025 for details on the whole framework.

Outline

Setup: Continuous–Bernoulli kernel

We assume that the single-site conditional distribution is the Continuous–Bernoulli on [1,1][-1,1] with canonical parameter

κi(σi)=bi+jiJijσj\kappa_i(\boldsymbol{\sigma}_{-i}) = b_i + \sum_{j\neq i} J_{ij}\sigma_j

and density (for x[1,1]x\in[-1,1])

K(xκ)=h(x)eκxA(κ),x[1,1]K(x\mid \kappa) = h(x) e^{\kappa x - A(\kappa)}, \qquad x\in[-1,1]

with h(x)=121[1,1](x)h(x)=\tfrac12 \mathbf{1}_{[-1,1]}(x) and A(κ)=log(sinhκ)logκA(\kappa)=\log(\sinh\kappa)-\log\kappa (so the κ0\kappa\to0 limit is uniform on [1,1][-1,1]).

The conditional mean is:

L(κ)=EK(κ)[x]=cothκ1κL(\kappa) =\mathbb{E}_{K(\cdot\mid\kappa)}[x] = \coth\kappa - \frac{1}{\kappa}

1. Master equation

Let’s consider a precise formalization of the asynchronous update procedure: each site ii has an independent Poisson clock of rate γi\gamma_i. When it rings, σi\sigma_i is replaced by a draw xK(κi(σi))x\sim K(\cdot\mid\kappa_i(\sigma_{-i})).

With σ(i,x)\sigma^{(i,x)} the configuration equal to σ\sigma but with coordinate ii set to xx, the transition density is

wi(σ(i,x)σ)=γiK(xκi(σi))w_i(\sigma^{(i,x)}\mid\sigma) =\gamma_i K(x\mid\kappa_i(\sigma_{-i}))

The master equation for p(σ,t)p(\sigma,t) is

tp(σ,t)=i=1N11[wi(σσ(i,x))p(σ(i,x),t)inflowwi(σ(i,x)σ)p(σ,t)outflow]dx\partial_t p(\sigma,t) =\sum_{i=1}^N\int_{-1}^{1} \Big[ \underbrace{w_i(\sigma\mid\sigma^{(i,x)}) p(\sigma^{(i,x)},t)}_{inflow} - \underbrace{w_i(\sigma^{(i,x)}\mid\sigma) p(\sigma,t)}_{outflow}\Big] dx

2. Probability currents

Let’s denote the site-wise flux density as:

Fi(σx):=wi(σ(i,x)σ)p(σ)F_i(\sigma \to x) := w_i(\sigma^{(i,x)}\mid\sigma) p^\ast(\sigma)

At steady state pp^\ast, by definition we have:

i11(Fi(σ(i,x)σ)Fi(σx))dx=0\sum_i \int_{-1}^1 \Big(F_i(\sigma^{(i,x)}\to\sigma)-F_i(\sigma\to x)\Big) dx = 0

3. Detailed balance condition

A special (and maybe the most intuitive) way to satisfy the previous eq. is the case of detailed balance or equilibrium. In this case, every transition is balanced:

Fi(σ(i,x)σ)=Fi(σx)F_i(\sigma^{(i,x)}\to\sigma) = F_i(\sigma\to x)

It can be shown that:

wi(σ(i,x)σ)p(σ)=wi(σσ(i,x))p(σ(i,x))    σiΦ(σ)=κi(σi)=bi+kiJikσkw_i(\sigma^{(i,x)}\mid\sigma) p^\ast(\sigma) = w_i(\sigma\mid\sigma^{(i,x)}) p^\ast(\sigma^{(i,x)}) \iff \partial_{\sigma_i}\Phi(\sigma) = \kappa_i(\sigma_{-i}) = b_i + \sum_{k\neq i} J_{ik} \sigma_k

I.e., the log-density Φ\Phi changes by the local slope at ii (see detailed derivation below).

Hence for jij\neq i:

σjσiΦ=Jij,σiσjΦ=Jji\partial_{\sigma_j}\partial_{\sigma_i}\Phi = J_{ij}, \qquad \partial_{\sigma_i}\partial_{\sigma_j}\Phi = J_{ji}

Mixed partials commute, thus:

Jij=Jji    detailed balanceJ_{ij}=J_{ji} \quad \iff \textit{detailed balance}

Therefore, equilibrium (detailed balance) is possible only when the coupling is symmetric.

4. Non-equilibrium Steady State (NESS)

Detailed balance (equilibrium) is only one specific way the steady-state condition can hold:

i11(Fi(σ(i,x)σ)Fi(σx))dx=0\sum_i \int_{-1}^1 \Big(F_i(\sigma^{(i,x)}\to\sigma)-F_i(\sigma\to x)\Big) dx = 0

Let’s consider the general case: an arbitrary (possibly asymmetric) JJ coupling. JJ can always be decomposed into:

J=JS+JA,JS:=12(J+J),JA:=12(JJ)J = J^{\mathrm{S}} + J^{\mathrm{A}},\qquad J^{\mathrm{S}} := \tfrac12(J+J^\top), \quad J^{\mathrm{A}} := \tfrac12(J-J^\top)

This leads to a decomposition of the edge currents on each directed edge (σ,σ(i,x))(\sigma,\sigma^{(i,x)}):

Fisym(σx):=12(Fi(σx)+Fi(σ(i,x)σ)),Fisol(σx):=12(Fi(σx)Fi(σ(i,x)σ))F_i^{\mathrm{sym}}(\sigma\to x) := \tfrac12\Big(F_i(\sigma\to x)+F_i(\sigma^{(i,x)}\to\sigma)\Big),\qquad F_i^{\mathrm{sol}}(\sigma\to x) := \tfrac12\Big(F_i(\sigma\to x)-F_i(\sigma^{(i,x)}\to\sigma)\Big)

so that Fi=Fisym+FisolF_i = F_i^{\mathrm{sym}} + F_i^{\mathrm{sol}} and Fisym(σx)=Fisym(σ(i,x)σ)F_i^{\mathrm{sym}}(\sigma\to x)=F_i^{\mathrm{sym}}(\sigma^{(i,x)}\to\sigma).

Since Fi(σx)=eΦ(σ)wi(σx)F_i(\sigma\to x) = e^{\Phi(\sigma)} w_i(\sigma\to x), define rates accordingly:

wisym(σx):=eΦ(σ)Fisym(σx),wisol(σx):=eΦ(σ)Fisol(σx)w_i^{\mathrm{sym}}(\sigma\to x) := e^{-\Phi(\sigma)} F_i^{\mathrm{sym}}(\sigma\to x),\qquad w_i^{\mathrm{sol}}(\sigma\to x) := e^{-\Phi(\sigma)} F_i^{\mathrm{sol}}(\sigma\to x)

so wi=wisym+wisolw_i = w_i^{\mathrm{sym}} + w_i^{\mathrm{sol}} and Fisym=eΦwisymF_i^{\mathrm{sym}} = e^{\Phi}w_i^{\mathrm{sym}}, Fisol=eΦwisolF_i^{\mathrm{sol}} = e^{\Phi}w_i^{\mathrm{sol}}.

We proceed in two steps: (i) the antisymmetric component is divergence-free and never changes the steady state; (ii) the remaining symmetric component yields exactly the same pp^\ast as the equilibrium distribution obtained by setting J=JSJ=J^{\mathrm{S}}. Finally, as a bonus, we show that the induced circulating flow is indeed tangential to the level sets of Φ\Phi, in accordance to the Helmholtz decomposition.

i) Antisymmetric component is divergence-free (does not change pp^\ast).

Starting again from stationarity of the full current,

i11(Fi(σ(i,x)σ)Fi(σx))dx=0\sum_i \int_{-1}^1 \Big(F_i(\sigma^{(i,x)}\to\sigma)-F_i(\sigma\to x)\Big) dx = 0

subtract the same identity written with FsymF^{\mathrm{sym}} in place of FF. Because Fsym(σx)=Fsym(σ(i,x)σ)F^{\mathrm{sym}}(\sigma\to x)=F^{\mathrm{sym}}(\sigma^{(i,x)}\to\sigma) on every edge, its contribution vanishes pairwise, giving

i11(Fisol(σ(i,x)σ)Fisol(σx))dx=0\sum_i \int_{-1}^1 \Big(F_i^{\mathrm{sol}}(\sigma^{(i,x)}\to\sigma)-F_i^{\mathrm{sol}}(\sigma\to x)\Big) dx = 0

Thus FsolF^{\mathrm{sol}} has zero divergence and does not change pp^\ast.

ii) Symmetric component determines pp^\ast and matches the equilibrium solution with J=JSJ=J^{\mathrm{S}}.

From Section 3, only JSJ^{\mathrm{S}} can appear in a potential gradient (commuting mixed partials). Therefore

σiΦ(σ)=bi+kiJikSσk\partial_{\sigma_i}\Phi(\sigma) = b_i + \sum_{k\neq i} J^{\mathrm{S}}_{ik}\,\sigma_k

which integrates to the same Φ=logp\Phi=\log p^\ast you obtain by enforcing detailed balance with the symmetric coupling J=JSJ=J^{\mathrm{S}} (i.e., in the absence of JAJ^{\mathrm{A}}). Equivalently, the symmetric current FsymF^{\mathrm{sym}} is reversible with respect to pp^\ast and alone reproduces the same steady state.

Note: Tangency to level sets of Φ\Phi (no work against the potential).

Multiply the preceding identity by a test function ψ(σ)\psi(\sigma) and integrate over σ\sigma (“summation by parts”) to obtain

dσi11Fisol(σx)[ψ(σ(i,x))ψ(σ)]dx=0\int d\sigma\, \sum_i\int_{-1}^1 F_i^{\mathrm{sol}}(\sigma\to x)\,\big[\psi(\sigma^{(i,x)})-\psi(\sigma)\big] dx = 0

Taking ψ=Φ\psi=\Phi yields

dσi11Fisol(σx)[Φ(σ(i,x))Φ(σ)]dx=0-\int d\sigma\, \sum_i\int_{-1}^1 F_i^{\mathrm{sol}}(\sigma\to x)\,\big[\Phi(\sigma^{(i,x)})-\Phi(\sigma)\big] dx = 0

so the solenoidal current is orthogonal to discrete gradients and flows tangent to the isocontours of Φ\Phi (the level sets of pp^\ast).

  1. Link to JAJ^{\mathrm{A}}.

Under the linear Continuous–Bernoulli parametrization, JAJ^{\mathrm{A}} cannot contribute to any scalar potential (mixed partials must commute). Therefore, varying JAJ^{\mathrm{A}} at fixed JSJ^{\mathrm{S}} leaves Φ\Phi and pp^\ast unchanged; it only affects the circulating, divergence-free component FsolF^{\mathrm{sol}}.

Conclusion: explicit steady-state pp^\ast for arbitrary JJ

By the results above, only the symmetric component JS:=12(J+J)J^{\mathrm{S}} := \tfrac12(J+J^\top) can shape the potential Φ=logp\Phi=\log p^\ast. Therefore, for arbitrary (possibly asymmetric) JJ, the non-equilibrium steady-state density takes the Boltzmann-like form:

p(σ)=1Zexp(ibiσi+i,jJijSσiσj)p^\ast(\sigma) = \frac{1}{Z} \exp\Bigg( \sum_i b_i\,\sigma_i + \sum_{i,j} J^{\mathrm{S}}_{ij}\,\sigma_i\,\sigma_j \Bigg)

with JiiS=0J^{\mathrm{S}}_{ii}=0 and partition function

Z=[1,1]Nexp(ibiσi+i,jJijSσiσj)dσZ = \int_{[-1,1]^N} \exp\Bigg( \sum_i b_i\,\sigma_i + \sum_{i,j} J^{\mathrm{S}}_{ij}\,\sigma_i\,\sigma_j \Bigg) \, d\sigma

The antisymmetric component JA:=12(JJ)J^{\mathrm{A}} := \tfrac12(J-J^\top) contributes only to divergence-free (solenoidal) probability currents tangent to the level sets of Φ\Phi and thus does not alter pp^\ast. This expression matches the manuscript’s stationary density, with JSJ^{\mathrm{S}} playing the role of the effective (symmetric) interaction matrix.

Supplementary Information 2

Here we provide a concise derivation of how σ\sigma and JJ changes as a consequence of free energy minimization resulting in an inference (node update) rule and a local, incremental learning rule, respectively.

For more background and a detailed derivation, see Spisak & Friston, 2025.

Let the instantaneous net input to node σi\sigma_i be

ui:=bi+jiJijσj,u_i := b_i + \sum_{j\ne i} J_{ij}\,\sigma_j,

and let q(σi)ebqσiq(\sigma_i)\propto e^{b_q\sigma_i} be the variational marginal with mean L(bq)L(b_q), where LL is the Langevin function (the mean of the Continuous–Bernoulli).

Inference (Hopfield-style)

We start by writing up the local variational free energy from the point of view of a single node σi\sigma_i:

F=Eq[lnq(σi)]Eq[lnp(σiσi)]F = \mathbb{E}_q[\ln q(\sigma_i)] - \mathbb{E}_q[\ln p(\sigma_{\setminus i}\mid \sigma_i)]

Next we express:

lnq(σi)=bqσiA(bq)+lnh(σi)\ln q(\sigma_i)= b_q\,\sigma_i - A(b_q) + \ln h(\sigma_i)

with mean Eq[σi]=L(bq)=A(bq)\mathbb{E}_q[\sigma_i]=L(b_q)=A'(b_q).

and

lnp(σiσi)=const+σijiJijσj.\ln p(\sigma_{\setminus i}\mid \sigma_i) = \text{const} + \sigma_i\,\sum_{j\ne i} J_{ij}\sigma_j.

As it depends on σi\sigma_i only through the linear slope jiJijσj\sum_{j\ne i}J_{ij}\sigma_j:

Eq[lnp(σiσi)]=const+L(bq)jiJijσj\mathbb{E}_q[\,\ln p(\sigma_{\setminus i}\mid \sigma_i)\,] = \text{const} + L(b_q)\,\sum_{j\ne i} J_{ij}\sigma_j

Now, we assemble the local free energy (dropping constants independent of bqb_q):

F=(bqbi)L(bq)[A(bq)A(bi)]L(bq)jiJijσj.F = (b_q - b_i)\,L(b_q) - \big[A(b_q)-A(b_i)\big] - L(b_q)\sum_{j\ne i}J_{ij}\sigma_j.

Equivalently, using ui=bi+jiJijσju_i=b_i+\sum_{j\ne i}J_{ij}\sigma_j and A(bq)=L(bq)A'(b_q)=L(b_q),

F=(bqui)L(bq)A(bq)+const.F = (b_q - u_i)\,L(b_q) - A(b_q) + \text{const}.
  1. Differentiate w.r.t. bqb_q and use A(bq)=L(bq)A'(b_q)=L(b_q) to cancel terms:

Fbq=(bqui)L(bq)+L(bq)A(bq)=(bqui)L(bq).\frac{\partial F}{\partial b_q} = (b_q-u_i)\,L'(b_q) + L(b_q) - A'(b_q) = (b_q-u_i)\,L'(b_q).

Setting the derivative to zero gives bq=uib_q^\star=u_i and therefore the node-wise update

  Eq[σi]  =  L(bq)  =  L ⁣(bi+jiJijσj)  \boxed{\;\mathbb{E}_q[\sigma_i] \;=\; L(b_q^\star) \;=\; L\!\left(b_i + \sum_{j\ne i} J_{ij}\,\sigma_j\right)\;}

which is the stochastic Hopfield/Boltzmann-style activation with the Langevin nonlinearity.

Learning (Hebb − anti-Hebb)

Make the dependence on uiu_i explicit by adding and subtracting the CB log-partition A(ui)A(u_i) (using lnp(σiσi)=uiσiA(ui)+lnh(σi)\ln p(\sigma_i\mid\sigma_{\setminus i})=u_i\,\sigma_i - A(u_i)+\ln h(\sigma_i) and Bayes’ rule):

F=Eq[(bqui)σi]+A(ui)A(bq)+const.F = \mathbb{E}_q\big[(b_q-u_i)\,\sigma_i\big] + A(u_i) - A(b_q) + \text{const}.

Now

Fui=Eq[σi]+A(ui)=Eq[σi]+L(ui),\frac{\partial F}{\partial u_i} = -\mathbb{E}_q[\sigma_i] + A'(u_i) = -\mathbb{E}_q[\sigma_i] + L(u_i),

and by the chain rule with ui/Jij=σj\partial u_i/\partial J_{ij}=\sigma_j we obtain

FJij=[L(ui)Eq[σi]]σj.\frac{\partial F}{\partial J_{ij}} = \big[\,L(u_i) - \mathbb{E}_q[\sigma_i] \,\big] \, \sigma_j.

Using a stochastic (sample-based) estimate Eq[σi]σi\mathbb{E}_q[\sigma_i]\approx\sigma_i and descending FF gives the local update

  ΔJij    σiσjHebbian (observed)    L ⁣(bi+kiJikσk)σjanti-Hebbian (predicted)  \boxed{\;\Delta J_{ij} \;\propto\; \underbrace{\sigma_i\sigma_j}_{\text{Hebbian (observed)}} \;-\; \underbrace{L\!\left(b_i+\sum_{k\ne i} J_{ik}\sigma_k\right)\sigma_j}_{\text{anti-Hebbian (predicted)}}\;}

This is a predictive coding-based learning rule that strengthens observed correlations and subtracts those already predicted by the current model.

Supplementary Information 3

Self-orthogonalization of attractor states

To illustrate how the learning rule in free energy minimizing attractor networks gives rise to efficient, (approximately) orthogonal representations of the external states, suppose the network has already learned a pattern s(1)\mathbf{s}^{(1)}, whose neural representation is the attractor σ(1)\boldsymbol{\sigma}^{(1)} and associated weights J(1)\mathbf{J}^{(1)}. When a new pattern s(2)\mathbf{s}^{(2)} is presented that is correlated with s(1)\mathbf{s}^{(1)}, the network’s prediction for σi(2)\sigma_i^{(2)} will be σ^i=L(bi+kiJik(1)σk)\hat{\sigma}_i = L(b_i + \sum_{k \neq i} J_{ik}^{(1)} \sigma_k). Because inference with J(1)\mathbf{J}^{(1)} converges to σ(1)\boldsymbol{\sigma}^{(1)} and σ(2)\boldsymbol{\sigma}^{(2)} is correlated with σ(1)\boldsymbol{\sigma}^{(1)}, the prediction σ^\hat{\boldsymbol{\sigma}} will capture variance in σ(2)\boldsymbol{\sigma}^{(2)} that is ‘explained’ by σ(1)\boldsymbol{\sigma}^{(1)}. The learning rule updates the weights based only on the unexplained (residual) component of the variance, the prediction error. In other words, σ^\hat{\boldsymbol{\sigma}} approximates the projection of σ(2)\boldsymbol{\sigma}^{(2)} onto the subspace already spanned by σ(1)\boldsymbol{\sigma}^{(1)}. Therefore, the weight update primarily strengthens weights corresponding to the component of σ(2)\boldsymbol{\sigma}^{(2)} that is orthogonal to σ(1)\boldsymbol{\sigma}^{(1)}. Thus, the learning effectively encodes this residual, σ(2)\boldsymbol{\sigma}^{(2)}_{\perp}, ensuring that the new attractor components being formed tend to be orthogonal to those already established.

For more details on self-orthogonalization in these networks, including an empirical demonstration, see Spisak & Friston, 2025.

Supplementary Information 4

Reconstruction of the attractor network from the activation timeseries.

We start from Eq. (1), written in matrix notation:
EHN(σ)=12σJσ+σb=12σJσ+σJJ1b=12σJσ+σJJ1b12bJ1b+12bJ1bE_{HN}(\bm{\sigma}) = -\frac{1}{2} \sigma^\top J \sigma + \sigma^\top b \\ = -\frac{1}{2} \sigma^\top J \sigma + \sigma^\top J J^{-1} b \\ = -\frac{1}{2} \sigma^\top J \sigma + \sigma^\top J J^{-1} b - \frac{1}{2} b^\top J^{-1} b + \frac{1}{2} b^\top J^{-1} b

To complete the square, we added and subtracted 12bJ1b\frac{1}{2} b^\top J^{-1} b within the exponent. We recognize that σJJ1b=σb\sigma^\top J J^{-1} b = \sigma^\top b. Now add and subtract 12bJ1b\frac{1}{2} b^\top J^{-1} b

=12[σJσ2σJJ1b+bJ1b]+12bJ1b=12[σJ1b]J[σJ1b]+12bTJ1b = -\frac{1}{2} \left[\sigma^\top J \sigma - 2 \sigma^\top J J^{-1} b + b^\top J^{-1} b \right] + \frac{1}{2} b^\top J^{-1} b \\ = -\frac{1}{2} \left[\sigma - J^{-1} b \right]^\top J \left[\sigma - J^{-1} b\right] + \frac{1}{2} b^T J^{-1} b

Given that P(σ)=exp(EHN(σ))P(\bm{\sigma}) = \exp(-E_{HN}(\bm{\sigma})), the expression simplifies to: P(σ)exp(12(σJ1b)J(σJ1b)) P(\sigma) \propto \exp\left(-\frac{1}{2} (\sigma - J^{-1} b)^\top J (\sigma - J^{-1} b) \right)

Note that the term 12bJ1b\frac{1}{2} b^\top J^{-1} b is independent of σ\sigma; we have absorbed it into the normalization constant.
This is exactly the exponent of a multivariate Gaussian distribution with mean J1bJ^{-1} b and covariance matrix J1J^{-1}, meaning that the weight matrix of the attractor network can be reconstructed as the inverse covariance matrix of activation timeseries of the lower-level nodes: J=Λ=Σ1 \mathbf{J} = -\Lambda = -\Sigma^{-1}

References
  1. Spisak, T., & Friston, K. (2025). Self-orthogonalizing attractor neural networks emerging from the free energy principle. arXiv. 10.48550/ARXIV.2505.22749