Weiyi Chen^{1}, Yongwan Lim^{1}, Yannick Bliesener^{1}, Shrikanth S. Narayanan^{1}, and Krishna S. Nayak^{1}

Real-time MRI (RT-MRI) has revolutionized the study of human speech production. Two state-of-the-art reconstruction techniques have been adopted by different groups to accelerate real time imaging, constrained SENSE, and regularized nonlinear inversion. In this study, we describe our best performing implementations of both classes of reconstructions, and compare performance on common data from spiral RT-MRI of human speech at 1.5T.

Speech RT-MRI data was collected on a GE Signa
Excite 1.5 T scanner with a custom 8-channel upper airway receiver coil, from 2
healthy adults during fluent speech at a normal pace. Imaging
parameters: Spiral GRE, golden angle (GA) increment, FOV: 20cm^{2}, FA: 15°,
slice thickness: 5mm, TR: 6.004ms. In-plane spatial resolution was 2.4mm^{2},
with 13 spiral interleaves required to meet the Nyquist sampling criteria. Both reconstruction methods were implemented
using MATLAB. SENSE with temporal finite difference constraint (SENSE-FD) is
described fully by Lingala et al. ^{2}. For the NLINV class of methods, we achieved
the best performance using extended NLINV combined with phase constrains ^{7}. This used the Iteratively Regularized Gauss
Newton Method (IRGNM) to solve the dynamic imaging problem as

$$\max_{\rho^i c_j^i}\sum_{j=1}^{N_c} \left\lVert y_i-\mathcal{PF} \left( \sum_{i=1}^K \rho^i c_j^i \right) \right\rVert_2^2 + \alpha \sum_{i=1}^K \left( \sum_{j=1}^{N_c} \left\lVert Wc_j^i \right\rVert_2^2 + \left\lVert \rho^i-\rho\prime^i \right\rVert_2^2 \right) $$

where $$$ \rho^i $$$ denotes
the $$$i$$$^{th} image, $$$c_j^i$$$ is the
corresponding $$$j$$$^{th} coil profile. $$$\mathcal{P}$$$ denotes
the projection onto the trajectory and $$$\mathcal{F}$$$ is the Fourier operator. $$$W$$$ is a
weighting matrix that enforces smoothness in the coil profiles, by setting its
element according to the distance from the center k-space $$$\left(1+a \left\lVert k \right\rVert_2^2 \right) ^b$$$. $$$\rho \prime$$$ is the
image at the previous time frame. Three sets of maps are generated to improve
robustness in the presence of data inconsistency ^{7,8}. Virtual conjugate coils ^{9} were generated to improve the condition of
parallel imaging for this method (VCC-ENLIVE). Four parameters were tuned
empirically: (1,2) The decreasing regularization parameter in the n^{th}
Newton step: $$$\alpha = \alpha_0 q^{n-1}$$$, where $$$\alpha_0=1$$$ and $$$q=0.8$$$. (3,4) Parameters for the $$$W$$$ matrix: $$$a$$$ was set
to 220 and $$$b$$$ was set
to 40. 6-10 iterations were used based on visual appearance.

**Figure 1** compares results from VCC-ENLIVE and SENSE-FD 2. SENSE-FD with 2-TR (3rd row) is the current
state-of-the-art at our institution and has been extensively used in speech
research. VCC-ENLIVE shows higher SNR with the same acceleration rate. However,
it exhibits more severe temporal blurring (see intensity-time plots), possibly
due to the l_{2}-norm
regularization to the previous time frame. Intensity-time plots show that VCC-ENLIVE
delivers more temporally consistent signal intensity in the velum. This may be
due to the real-time estimation of the sensitivity maps.

**Figure
2** shows five representative
frames from both methods with 1TR temporal resolution (R=13). VCC-ENLIVE
reduces noise amplification by combining phase constraint with parallel
imaging. This allows for better visualization of fine structures such as velum
and hard palate. The temporal blurring degrades
the temporal fidelity in spite of the high acceleration rate (1-TR, R=13).

- Lingala, S. G., Sutton, B. P., Miquel, M. E. & Nayak, K. S. Recommendations for real-time speech MRI. Journal of Magnetic Resonance Imaging 43, 28–44 (2016).
- Lingala, S. G. et al. A fast and flexible MRI system for the study of dynamic vocal tract shaping. Magn. Reson. Med. 77(1):112-125 (2016).
- Fu, M. et al. High-resolution dynamic speech imaging with joint low-rank and sparsity constraints. Magn. Reson. Med. 73, 1820–1832 (2015).
- Niebergall, A. et al. Real-time MRI of speaking at a resolution of 33 ms: Undersampled radial FLASH with nonlinear inverse reconstruction. Magn. Reson. Med. 69, 477–485 (2013).
- Freitas, A. C., Wylezinska,
M., Birch, M. J., Petersen, S. E. & Miquel, M. E. Comparison of Cartesian
and Non-Cartesian Real-Time MRI Sequences at 1.5T to Assess Velar Motion and
Velopharyngeal Closure during Speech. PLoS One 11, e0153322
(2016).
- Uecker, M. et al. Real-time MRI at a resolution of 20 ms. NMR Biomed. 23, 986–994 (2010).
- Holme, H. C. M. et al. ENLIVE: An Efficient Nonlinear Method for Calibrationless and Robust Parallel Imaging. arXiv:1706.09780 (2017).
- Uecker, M. et al. ESPIRiT-an eigenvalue approach to autocalibrating parallel MRI: Where SENSE meets GRAPPA. Magn. Reson. Med.71(3):990-1001 (2014).
- Blaimer, M. et al. Virtual coil concept for improved parallel MRI employing conjugate symmetric signals. Magn. Reson. Med. 61, 93–102 (2009).
- Knoll, F., Clason, C., Bredies, K., Uecker, M. & Stollberger, R. Parallel imaging with nonlinear reconstruction using variational penalties. Magn. Reson. Med. 67, 34–41 (2012).

Comparison
between SENSE-FD and VCC-ENLIVE with 1-TR and 2-TR temporal resolution. VCC-ENLIVE
shows higher SNR but more severe temporal blurring (yellow arrow) with the same
acceleration rate. Intensity-time plot shows that VCC-ENLIVE provides more
temporally consistent signal intensity in several crucial articulators, such as
velum marked by white arrow.

Figure
2. Five representative frames of SENSE-FD and VCC-ENLIVE (1TR, acceleration
rate R=13), from the time-span marked by the blue shade in the intensity-time
plot. VCC-ENLIVE reduces noise amplification by combining phase constrain with
parallel imaging, enables higher SNR. This will benefit visualizing fine
structures such as hard palate (red arrow) and velum (yellow arrow) during
speech. SENSE with a temporal finite difference constraint provide crisper
intensity-time profile, improved
depiction of rapidly moving articulators (such as tongue tips and lips).