Ultrafast Speech Imaging at High Spatial Resolution using Model-Consistency Condition Reconstruction with Progressive Temporal Basis Learning
Julia Velikina1, Andrew Alexander1, Joseph Salmons1, Eric Raimy1, Thomas Purnell1, Steven Kecskemeti1, and Alexey Samsonov1

1University of Wisconsin - Madison, Madison, WI, United States


Dynamic MRI holds high potential for real-time imaging of upper airway, which can provide insights into questions of speech science and also have important clinical applications. However, speech imaging places increased demands on spatial and temporal resolution, necessitating image reconstruction from severely undersampled data. Previously reported methods use low-rank constraints with spiral navigators to enable temporal basis estimation, otherwise infeasible with standard learning methods. We propose an alternative solution based on a novel concept of progressive learning, which does not require separate specialized pulse sequences for navigator acquisitions, while providing high 7.4 ms temporal and 1.25x1.25x8 mm spatial resolution.


Dynamic MRI holds potential for real-time upper airway imaging, providing insights into articulation, phonetics, language acquisition and disorders, and important clinical applications (velopharyngeal insufficiency). Unlike other MRI applications, speech imaging places increased demands on spatial (1-4mm) and temporal (<40-70ms) resolutions to visualize small structures (velum, tongue border) and study velopharyngeal closure/tongue movements1,2. This necessitates image reconstruction from highly undersampled data even for optimized acquisitions. Recently, the use of low-rank constraints achieved 9.8ms temporal, 2.2x2.2x6.5mm spatial resolutions3 interleaving single-view data collection with low-resolution spiral navigators to enable temporal basis estimation, which cannot otherwise be learned from limited data. Here, we propose an alternative solution based on novel concept of progressive learning, which does not require separate training data collection and, hence, specialized pulse sequences, while attaining even higher temporal/spatial resolutions.


We demonstrate the new method using MOdel Consistency COndition (MOCCO) approach4, which efficiently reconstructs undersampled dynamic data given an estimate of temporal model basis. The standard, low-resolution learning (LRL), way constructs the basis from first several principal components (PCs) of singular value decomposition (SVD) of low-resolution image series estimated from fully-sampled k-space center. However, for extreme undersampling, too few samples are available in k-space center for robust LRL. Recently, low-rank matrix completion was proposed for basis learning5,6. However, it becomes computationally prohibitively for non-Cartesian sampling. Therefore, we propose progressive learning (PL) to determine PCs by imposing regularity constraint on their temporal evolution. Image series at target resolution $$$\Delta t$$$ is obtained as follows: (1) initialize image series from data grouped at coarse temporal resolution $$$\Delta t_1$$$ ensuring sufficient k-space center sampling for SVD; (2) interpolate between frames to produce image series approximation at finer resolution $$$\Delta t_2$$$; (3) apply SVD to learn PCs at $$$\Delta t_2$$$; (4) perform MOCCO image series reconstruction from data grouped at finer resolution $$$\Delta t_2$$$; (5) repeat steps (2-4) to refine resolution further.


Data were acquired in healthy volunteer scans at 3T scanner (MR750, GE Healthcare) using 6-channel carotid coil. To ensure compatibility with proposed PL, radial pulse sequence with golden angle ordering was used7 as it allows flexible selection of temporal resolution due to nearly uniform k-space coverage for any subset of consecutive projections. During the first scan (sagittal orientation, TR/TE=3.7ms/1.4ms, 8mm slice thickness, FOV=240mm, 192x192 matrix) volunteer pronounced “LaLa Land. Moonlight”. The second scan performed in axial orientation to image arythenoid cartilage (TR/TE=3.5ms/1.3ms, 6mm slice thickness, FOV=180mm, 140x140 matrix) had volunteer alternate sounds /z/ and /s/. In both cases, acquired k-space data were grouped into 20 projections/frame to form coarse temporal sequence for method’s initialization, and reconstructed with SENSE8. During PCs refinement, image series were interpolated by cubic splines9. The PL procedure was repeated three times in the first case to obtain image series with 2 projections/frame ($$$\Delta t=$$$7.4ms), and twice in the second case leading to image series with 5 projections/frame ($$$\Delta t=$$$17.5ms). The results were compared to MOCCO reconstruction with LRL PCs at highest targeted temporal resolution, and with alternative approach imposing regularity constraint in temporal dimension using first difference regularization10.


Representative time frames (Fig. 1) demonstrate that PL-MOCCO provides consistent image quality for different temporal resolutions while MOCCO with equivalent number of LRL PCs (seven) exhibits poor image quality degrading further for higher temporal resolution. Although visual image quality improves for two LRL PCs due to more efficient constraining, this model is insufficient to resolve complex image dynamics (arrows). Temporal profiles (Fig. 2) further illustrate the need for high temporal resolution to resolve rapid lip/tongue movements. The difference in PCs obtained by different approaches for axial speech imaging at $$$\Delta t=$$$17.5ms is illustrated in Fig. 3. Note rapid oscillations of PCs learned in standard/alternative ways due to residual aliasing in training data, leading to poor image quality for reconstruction from them (Fig. 4a-b), while PL produces images without residual artifacts (Fig. 4c).


In this work, we proposed a new approach for dynamic imaging with very high temporal and spatial resolutions. The new method derives from advantages of several components including golden angle sampling and novel progressive learning of temporal model. The new approach was able to perform dynamic speech imaging using only two projections per frame, which brings it near limit of maximum possible temporal resolutions with radial acquisition. In context of speech imaging, the obtained spatial (1.25x1.25x8mm) and temporal (7.4ms) resolutions may be beneficial for visualizing velopharyngeal insufficiency for speech therapy and surgical interventions and allow for multi-slice imaging at high resolutions. Additionally, the use of radial sampling avoids off-resonance blurring/distortion common for alternative sampling schemes such as spirals.


The authors gratefully acknowledge support from NIH R21 EB018483, R01NS066982, and University of Wisconsin - Madison fund UW2020.


1. Lingala SG, Sutton BP, Miquel ME, Nayak KS. Recommendations for real-time speech MRI. J. Magn. Reson. Imaging 2016;43:28–44.

2. Bae Y, Kuehn DP, Conway CA, Sutton BP. Real-time magnetic resonance imaging of velopharyngeal activities with simultaneous speech recordings. Cleft Palate Craniofac J 2011;48:695–707.

3. Fu M, Zhao B, Carignan C, Shosted RK, Perry JL, Kuehn DP, Liang ZP, Sutton BP. High-resolution dynamic speech imaging with joint low-rank and sparsity constraints. Magn. Reson. Med. 2015;73:1820–1832.

4. Velikina JV and Samsonov AA. Reconstruction of dynamic image series from undersampled MRI data using data-driven model consistency condition (MOCCO). Magn. Reson. Med. 2015;74:1279–1290.

5. Lyu J, Nakarmi U, Zhang C, Ying L. Highly accelerated cardiac cine parallel MRI using low-rank matrix completion and partial separability model, Proc. SPIE Compressive Sensing V: From Diverse Modalities to Big Data Analytics, 2016:9857.

6. Balachandrasekaran A, Ongie G, Jacob M. Accelerated dynamic MRI using structured low rank matrix completion, Proc. IEEE International Conference on Image Processing 2016.

7. Winkelmann S, Schaeffter T, Koehler T, Eggers H, Doessel O. An optimal radial profile order based on the Golden Ratio for time-resolved MRI. IEEE Trans Med Imaging 2007;26:68–76.

8. Pruessmann KP, Weiger M, Scheidegger MB, Boesiger P. SENSE: Sensitivity encoding for fast MRI. Magn. Reson. Med. 1999;42:952–962.

9. Boor C de. A Practical Guide to Splines, Springer-Verlag New York, 1978.

10. Adluru G, Awate SP, Tasdizen T, Whitaker RT, DiBella EVR. Temporally constrained reconstruction of dynamic cardiac perfusion MRI. Magn. Reson. Med. 2007;57:1027–1036.


Representative time frames corresponding to /la/ sound formation from the proposed PL-MOCCO and standard MOCCO at two different temporal resolutions. Significant temporal blurring is observed for MOCCO with two PCs (arrows) due to inability of this model to represent complex temporal dynamics (see also Fig. 2).

Temporal profiles through the bottom of upper lip corresponding to the dashed line in Fig. 1: (a) PL-MOCCO with 7 PCs at 18.5 ms/frame; (b) PL-MOCCO with 7 PCs at 7.4 ms/frame; (c) MOCCO with 2 PCs at 7.4 ms/frame. (d) MOCCO with 7 PCs at 7.4 ms/frame. Standard MOCCO reconstruction fails due to the inability of LRL to learn PCs from limited data. At the same time, PL-MOCCO successfully restores temporal dynamics. Note that reconstructing at higher temporal resolution (7.4 ms/frame) with PL-MOCCO resolves rapid lip movement (arrow) which is blurred on the preceding reconstruction stage (18.5 ms/frame).

Comparison of temporal bases obtained for 17.5 ms temporal resolution from low resolution images reconstructed with the given temporal resolution (standard LRL method), from images reconstructed with temporal difference regularization (alternative approach), and proposed PL approach. Note that most PCs learned with the standard and alternative approaches suffer from rapid oscillations due to unresolved aliasing in training data.

Effect of temporal basis learning type on the reconstruction quality of axial data (5 PCs, shown in Fig. 4). (a) MOCCO with LRL; (b) MOCCO with PCs learned from temporally constrained reconstruction; (c) proposed MOCCO with PL PCs. Note gradual improvement of image quality towards PL-MOCCO case.

Proc. Intl. Soc. Mag. Reson. Med. 26 (2018)