Bo Zhu^{1,2,3}, Jeremiah Liu^{4}, Neha Koonjoo^{1,2,3}, Bruce R Rosen^{1,2}, and Matthew S Rosen^{1,2,3}

Although the macroscopic equations of motion for nuclear magnetic resonance have been described and modeled for decades by the Bloch equations, limited human intuition of their nonlinear dynamics is an obstacle to fully exploiting the vast parameter space of MR pulse sequences. Here we recast the general problem of pulse sequence development as a game of perfect information, and propose an approach to optimize game play with a Bayesian derivative of reinforcement learning within a MRI physics simulation environment. We demonstrate an AI agent learning a canonical pulse sequence (gradient echo) and generating non-intuitive pulse sequences approximating Fourier spatial encoding.

Although the macroscopic equations of motion for nuclear magnetic resonance have been described and modeled for decades by the Bloch equations, limited human intuition of their nonlinear dynamics, particularly over long periods of non-steady-state time evolution or in regimes such as off-resonance excitation, is an obstacle to fully exploiting the vast parameter space of MR pulse sequence control to optimize imaging protocols or develop novel contrast mechanisms.

Complex dynamics
arising from series of behaviors governed by simple rules have also been a
central subject of study in game theory^{1}; great advances in
understanding and computer control of games of perfect information such as
Chess and Go have been made possible with reinforcement learning agents^{2,3}
exploring these fully observable game environments with learned reward
signals. Because the actions,
environmental dynamics, and feedback of these perfect information games can be
simulated, artificial systems can be developed to repeatedly play these games *in silico* to rapidly improve agents’
performance.

Here we recast the general problem of pulse sequence development as a game of perfect information, and propose an approach to optimize game play with a Bayesian derivative of reinforcement learning within a MRI physics simulation environment. In this work, we perform preliminary proof-of-principle 1-D experiments and demonstrate our agent learning a canonical pulse sequence (gradient echo) and also generating non-intuitive pulse sequences that can produce signals approximating Fourier spatial encoding.

Our reinforcement
learning framework is composed of an artificial intelligence **agent** which generates pulse sequence **actions** that interact with its **environment**, the time-evolution of the
imaging samples’ nuclear magnetization governed by the Bloch equations, which
produces **rewards** based on
reconstructed image error, that guide the update and refinement of the agent’s
actions (Figure 1).

Our Bayesian approach to model this system is composed of pulse
sequence actions $$$X(t) = [x_{1}(t),..., x_{D}(t)]$$$ that are generated from a
distribution $$$p(X)$$$. In order to effectively
capture the dynamic interplay between pulse sequences and their physical effect
on the nuclear spin environment, we modeled $$$p(X)$$$ using a dependent Gaussian
process with sparseness-inducing priors^{4}. The value function $$$f$$$ which maps the generated
action $$$X$$$ to a predicted image score $$$y$$$ is modeled using a Bayesian
neural network. In order to balance exploitation
and exploration in a principled manner, the updated model posterior $$$p(f|y_{t}, X_{t+1})$$$ proposes the next set of
pulse sequence actions $$$X^{*}$$$ by maximizing an
acquisition function $$$u_{t}(X)$$$. In the current
implementation, we chose $$$u_{t}(X)$$$ to be the Expected
Improvement (EI)^{5}.

The MRI physics simulator
developed here was inspired by an open-source GPU-based MRI simulation tool,
MRILAB^{6}. A MRI simulation wrapper was developed for Python and the
simulation computational kernel was implemented entirely within the Python
wrapper for Nvidia CUDA – Pycuda. The kernel was executed in parallel within
the GPU environment since the Bloch equations can be solved independently for
every voxel of the virtual objects^{7}.

The inputs of the
simulation kernel were 1) a set of 3D objects (e.g. cylinders, spheres, cubes…)
with matrix = 132 ×
132 × 30, resolution = 1 mm, T_{1}
= 2100ms, T_{2} = 160ms, T_{2}^{*} = 140ms, 1 spin per
voxel, and spin density, r = 1; 2)
hardware specifications (the imaging static magnetic field gradients (G_{x},
G_{y}, G_{z}) and the transmitting RF field, B_{1}) and
3) the MRI pulse sequence generated by the agent. The output of the simulation kernel was the
signal with the real and imaginary components. For each iteration, each
component of the signal was subtracted from a typical gradient-echo
sequence-based signal, then squared and summed together to give the mean
squared error, used as the scoring parameter.

We performed two 1-D experiments with different constraints upon the pulse sequence generator. For each experiment, a 90º hard RF pulse was set at the beginning of the pulse sequence.

The first experiment constrained the gradient Gx to be composed of two constant-amplitude blocks, each with any amplitude, start time and duration (blocks allowed to overlap). As seen in Figure 2, in this experiment the agent converged at low MSE upon “discovering” a form of the gradient echo.

The second experiment loosened the constraints upon Gx and modeled it using Haar wavelets with degree 5, with the wavelet weights generated by the $$$p(X)$$$ distribution. The Haar wavelet model was chosen as a parsimonious parameterization of the pulse sequence. Figures 3 and 4 show two examples of non-intuitive agent-generated pulse sequences that approximate Fourier encoding.

Our future work includes the use of more sophisticated reward functions that optimize gradient smoothness, minimize total gradient moment, maximize signal per unit time, as well as including explicit functional forms for shaped RF pulses, and extension to 2- and 3-dimensional imaging.

[1] Nisan, Noam, et al., eds. Algorithmic game theory. Vol. 1. Cambridge: Cambridge University Press, 2007.

[2] Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.7587 (2016): 484-489.

[3] Kaelbling, Leslie Pack, Michael L. Littman, and Andrew W. Moore. "Reinforcement learning: A survey." Journal of artificial intelligence research 4 (1996): 237-285.

[4] Ren, Boyu, et al. "Bayesian nonparametric ordination for the analysis of microbial communities." Journal of the American Statistical Association just-accepted (2017).

[5] Shahriari, Bobak, et al. "Taking the human out of the loop: A review of bayesian optimization." Proceedings of the IEEE 104.1 (2016): 148-175.

[6] Liu, Fang, et al. "Fast Realistic MRI Simulations Based on Generalized Multi-Pool Exchange Tissue Model." IEEE transactions on medical imaging 36.2 (2017): 527-537.

[7] Xanthis, Christos G., et al. "MRISIMUL: a GPU-based parallel approach to MRI simulations." IEEE transactions on medical imaging 33.3 (2014): 607-617.

Figure 1: Schematic of the AUTOSEQ reinforcement learning framework composed
of an artificial intelligence agent
which generates pulse sequence actions that
interact with its environment, the
time-evolution of the imaging samples’ nuclear magnetization governed by the
Bloch equations, which produces rewards
based on reconstructed image error, that guide the update and refinement of the
agent’s actions.

Figure 2: (A, Top) Explicitly coded gradient echo sequence, indicating RF, Gx, and readout window. (A, Middle) Phantom Object A simulated time-domain signal (left) from pulse sequence A, and its 1-D FT (right) forming a 1-D projection of Object A. (A, Bottom) As in Figure 2A-Middle, but for Object B. (B, Top) AUTOSEQ agent converged at low MSE upon “discovering” a form of the gradient echo. (B, Middle) Phantom Object B simulated time-domain signal (left) from pulse sequence B, and its 1-D FT (right) forming a 1-D projection of Object B. (B, Bottom) As in Figure 2B-Middle, but for Object B.

Figure 3: (A, Top) Explicitly coded gradient echo sequence, indicating RF, Gx, and readout window. (A, Middle) Phantom Object A simulated time-domain signal (left) from pulse sequence A, and its 1-D FT (right) forming a 1-D projection of Object A. (A, Bottom) As in Figure 3A-Middle, but for Object B. (B, Top) Pulse sequence discovered from parsimonious parameterization with Harr wavelet model. (B, Middle) Phantom Object B simulated time-domain signal (left) from pulse sequence B, and its 1-D FT (right) forming a 1-D projection of Object B. (B, Bottom) As in Figure 3B-Middle, but for Object B.

Figure 4: (A, Top) Explicitly coded gradient echo sequence, indicating RF, Gx, and readout window. (A, Middle) Phantom Object A simulated time-domain signal (left) from pulse sequence A, and its 1-D FT (right) forming a 1-D projection of Object A. (A, Bottom) As in Figure 4A-Middle, but for Object B. (B, Top) Another pulse sequence discovered from parsimonious parameterization with Harr wavelet model. (B, Middle) Phantom Object B simulated time-domain signal (left) from pulse sequence B, and its 1-D FT (right) forming a 1-D projection of Object B. (B, Bottom) As in Figure 4B-Middle, but for Object B.