Comparison of leading reconstruction techniques for real-time speech MRI
Weiyi Chen1, Yongwan Lim1, Yannick Bliesener1, Shrikanth S. Narayanan1, and Krishna S. Nayak1

1Electrical Engineering, University of Southern California, Los Angeles, CA, United States


Real-time MRI (RT-MRI) has revolutionized the study of human speech production. Two state-of-the-art reconstruction techniques have been adopted by different groups to accelerate real time imaging, constrained SENSE, and regularized nonlinear inversion. In this study, we describe our best performing implementations of both classes of reconstructions, and compare performance on common data from spiral RT-MRI of human speech at 1.5T.


Real-time MRI (RT-MRI) has revolutionized the study of human speech production 1. MRI is intrinsically challenged by trade-offs between spatiotemporal resolution, signal-to-noise ratio and spatial coverage. Rapid MRI methods based on parallel imaging, constrained reconstruction or both have been applied to effectively improve the tradeoff 2–5. In recent years, two classes of methods have been adopted by groups around the world. One is SENSE with temporal finite difference constraints 2, which reliably achieves 2.4mm/12ms spatiotemporal resolution. One is regularized nonlinear inversion (NLINV), a self-calibrated parallel imaging technique with low latency 4,6, which reliably achieves 1.5mm/33.3ms spatiotemporal resolution 4. To date, these two classes of methods have not been compared. In this study, we make our best efforts to optimize each technique on common data, and compare resulting image quality.


Speech RT-MRI data was collected on a GE Signa Excite 1.5 T scanner with a custom 8-channel upper airway receiver coil, from 2 healthy adults during fluent speech at a normal pace. Imaging parameters: Spiral GRE, golden angle (GA) increment, FOV: 20cm2, FA: 15°, slice thickness: 5mm, TR: 6.004ms. In-plane spatial resolution was 2.4mm2, with 13 spiral interleaves required to meet the Nyquist sampling criteria. Both reconstruction methods were implemented using MATLAB. SENSE with temporal finite difference constraint (SENSE-FD) is described fully by Lingala et al. 2. For the NLINV class of methods, we achieved the best performance using extended NLINV combined with phase constrains 7. This used the Iteratively Regularized Gauss Newton Method (IRGNM) to solve the dynamic imaging problem as

$$\max_{\rho^i c_j^i}\sum_{j=1}^{N_c} \left\lVert y_i-\mathcal{PF} \left( \sum_{i=1}^K \rho^i c_j^i \right) \right\rVert_2^2 + \alpha \sum_{i=1}^K \left( \sum_{j=1}^{N_c} \left\lVert Wc_j^i \right\rVert_2^2 + \left\lVert \rho^i-\rho\prime^i \right\rVert_2^2 \right) $$

where $$$ \rho^i $$$ denotes the $$$i$$$th image, $$$c_j^i$$$ is the corresponding $$$j$$$th coil profile. $$$\mathcal{P}$$$ denotes the projection onto the trajectory and $$$\mathcal{F}$$$ is the Fourier operator. $$$W$$$ is a weighting matrix that enforces smoothness in the coil profiles, by setting its element according to the distance from the center k-space $$$\left(1+a \left\lVert k \right\rVert_2^2 \right) ^b$$$. $$$\rho \prime$$$ is the image at the previous time frame. Three sets of maps are generated to improve robustness in the presence of data inconsistency 7,8. Virtual conjugate coils 9 were generated to improve the condition of parallel imaging for this method (VCC-ENLIVE). Four parameters were tuned empirically: (1,2) The decreasing regularization parameter in the nth Newton step: $$$\alpha = \alpha_0 q^{n-1}$$$, where $$$\alpha_0=1$$$ and $$$q=0.8$$$. (3,4) Parameters for the $$$W$$$ matrix: $$$a$$$ was set to 220 and $$$b$$$ was set to 40. 6-10 iterations were used based on visual appearance.


Figure 1 compares results from VCC-ENLIVE and SENSE-FD 2. SENSE-FD with 2-TR (3rd row) is the current state-of-the-art at our institution and has been extensively used in speech research. VCC-ENLIVE shows higher SNR with the same acceleration rate. However, it exhibits more severe temporal blurring (see intensity-time plots), possibly due to the l2-norm regularization to the previous time frame. Intensity-time plots show that VCC-ENLIVE delivers more temporally consistent signal intensity in the velum. This may be due to the real-time estimation of the sensitivity maps.

Figure 2 shows five representative frames from both methods with 1TR temporal resolution (R=13). VCC-ENLIVE reduces noise amplification by combining phase constraint with parallel imaging. This allows for better visualization of fine structures such as velum and hard palate. The temporal blurring degrades the temporal fidelity in spite of the high acceleration rate (1-TR, R=13).

Discussion and Conclusion

In a direct comparison, VCC-ENLIVE provides higher SNR and signal stability, and SENSE-FD provides superior temporal fidelity. Both methods depend on tuning of several hyper-parameters based on visual inspection on the reconstructed image. Current work is only implemented on the data from spiral trajectory at 1.5T, further investigation using more data sets with various acquisition methods and field strengths will be of interest. It is also possible to combine nonlinear inversion with variational constraint 10. This could improve image quality through combining a better estimation of the coil and the crisp intensity-time profile. This remains as future work.


This work was supported by National Institute of Health under NIH-R01-DC007124 and National Science Foundation under NSF-1514544.


  1. Lingala, S. G., Sutton, B. P., Miquel, M. E. & Nayak, K. S. Recommendations for real-time speech MRI. Journal of Magnetic Resonance Imaging 43, 28–44 (2016).
  2. Lingala, S. G. et al. A fast and flexible MRI system for the study of dynamic vocal tract shaping. Magn. Reson. Med. 77(1):112-125 (2016).
  3. Fu, M. et al. High-resolution dynamic speech imaging with joint low-rank and sparsity constraints. Magn. Reson. Med. 73, 1820–1832 (2015).
  4. Niebergall, A. et al. Real-time MRI of speaking at a resolution of 33 ms: Undersampled radial FLASH with nonlinear inverse reconstruction. Magn. Reson. Med. 69, 477–485 (2013).
  5. Freitas, A. C., Wylezinska, M., Birch, M. J., Petersen, S. E. & Miquel, M. E. Comparison of Cartesian and Non-Cartesian Real-Time MRI Sequences at 1.5T to Assess Velar Motion and Velopharyngeal Closure during Speech. PLoS One 11, e0153322 (2016).
  6. Uecker, M. et al. Real-time MRI at a resolution of 20 ms. NMR Biomed. 23, 986–994 (2010).
  7. Holme, H. C. M. et al. ENLIVE: An Efficient Nonlinear Method for Calibrationless and Robust Parallel Imaging. arXiv:1706.09780 (2017).
  8. Uecker, M. et al. ESPIRiT-an eigenvalue approach to autocalibrating parallel MRI: Where SENSE meets GRAPPA. Magn. Reson. Med.71(3):990-1001 (2014).
  9. Blaimer, M. et al. Virtual coil concept for improved parallel MRI employing conjugate symmetric signals. Magn. Reson. Med. 61, 93–102 (2009).
  10. Knoll, F., Clason, C., Bredies, K., Uecker, M. & Stollberger, R. Parallel imaging with nonlinear reconstruction using variational penalties. Magn. Reson. Med. 67, 34–41 (2012).


Comparison between SENSE-FD and VCC-ENLIVE with 1-TR and 2-TR temporal resolution. VCC-ENLIVE shows higher SNR but more severe temporal blurring (yellow arrow) with the same acceleration rate. Intensity-time plot shows that VCC-ENLIVE provides more temporally consistent signal intensity in several crucial articulators, such as velum marked by white arrow.

Figure 2. Five representative frames of SENSE-FD and VCC-ENLIVE (1TR, acceleration rate R=13), from the time-span marked by the blue shade in the intensity-time plot. VCC-ENLIVE reduces noise amplification by combining phase constrain with parallel imaging, enables higher SNR. This will benefit visualizing fine structures such as hard palate (red arrow) and velum (yellow arrow) during speech. SENSE with a temporal finite difference constraint provide crisper intensity-time profile, improved depiction of rapidly moving articulators (such as tongue tips and lips).

Proc. Intl. Soc. Mag. Reson. Med. 26 (2018)