Realistic dynamic speech numerical phantom for the evaluation of real-time MRI acquisition and reconstruction methods
Joseph Martin1,2, Redha Boubertakh1,3, Matthieu Ruthven1, and Marc E Miquel1,3

1Clinical Physics, Barts Health NHS Trust, London, United Kingdom, 2Medical Physics and Engineering, Kings College London, London, United Kingdom, 3William Harvey Research Institute, Queen Mary University of London, London, United Kingdom


Real time MRI (rtMRI) in human speech is an active field of research, with a particular clinical focus on the assessment of speech disorders. In this work, a numerical phantom is developed to allow acquisition and reconstructions schemes for rtMRI to be compared to a ‘gold standard’. Previously acquired 2D rtMRI images of speech were used to create anatomical masks of various speech organs. An interpolation method was then used to create a continuous time model of the moving structures, which forms the dynamic phantom. The model is then tested using different k-space sampling schemes (Cartesian, radial and spiral).


Real-time MR imaging (rtMRI) of the upper vocal tracts during speech is an active field of research1 with some important clinical applications, in particular the assessment of velar function in repaired cleft palate patients2. Numerous acquisition and reconstruction schemes have been explored in order to improve temporal resolution, however, despite a call for it in a recent recommendation paper3, there are still few comparison studies between acquisition and/or reconstruction schemes 4-7 Those studies relied on volunteer data and thus, comparison is made difficult by the absence of a gold-standard to compare to. Realistic numerical simulations (phantoms) can provide such a gold-standard as has been shown in cardiac MRI8. In this work, such a numerical model for speech rtMRI is developed.


The computational phantom was developed using a prototyping software development framework created in MATLAB (version 2016b, Mathworks, Natick, MA, USA). The whole development process was split into two stages; (I) Phantom Development and (II) Testing and Implementation (II). A flow diagram of the overall process can be seen in figure 1.

Stage I. Phantom Development: Previously acquired 2D rtMR dimages of a volunteer phonating a standard speech sample were used. Images were acquired at 3T (Achieva Tx, Philips Medical Systems, Best, The Netherlands) with a temporal resolution of 15 frames-per-second (fps) and a resolution of 1.719 x 1.719 mm2 to adequately capture the motion of the velum.5,9. These images were then edge enhanced using the Canny method10 and the relevant speech organs and structures segmented using a bespoke semi-automatic threshold tool. These segmentations were used to create binary masks, which were processed using morphological operators to make them more uniform resulting in 6 anatomical masks: ‘Mandible’, ‘Maxilla’, ‘Epiglottis’, ’Velum’, ’Tongue’ and ‘Head’, the latter representing the parts of the head not included in the other masks. A continuous time motion model was then created by linearly interpolating between two given masks in the time series. Finally, the 2D k-space phantom data was derived as a time series using FFT and a non-uniform fast Fourier transform (NUFFT) for Cartesian and non-Cartesian sampling trajectories respectively. The novel phantom has a simulated symmetrical FOV of 30 cm, image matrix size of 256 x 256, k-space matrix size of 256 x 256, spatial resolution of 1.719 x 1.719 mm2, a temporal resolution of 30 fps and slice thickness of 10mm.

Stage II. Implementation and Testing: The phantom was used to simulate and reconstruct a range of k-space sampling trajectories. The dynamic k-space phantom (two spatial frequency dimensions, kx and ky. and one temporal, t) was sampled in a manner that simulated Cartesian, radial and spiral trajectories, with optional added Gaussian noise. An inverse FTT and inverse NUFFT were used to reconstruct simulated images for Cartesian and non-Cartesian sampling trajectories respectively.


Initially, images using simulated Cartesian, radial and spiral trajectories were produced. Figure 2 shows a dynamic series of Cartesian fully sampled images of the phantom showing velopharyngeal closing and opening. Figure 3 shows the images produced using radial k-space sampling with varying undersampling factors (with normalised root-mean square (RMS) reconstruction errors calculated when compared to the phantom), whilst figure 4 shows an image produced using a spiral trajectory with and without added Gaussian noise.


A 2D speech MRI phantom has been developed that can be used to simulate k-space data sampled along any arbitrary sampling trajectories. This has been tested for radial, spiral and Cartesian trajectories. As an example, in the radial reconstruction (figure 3) increasing levels of streaking artefacts and RMS error are observed as the k-space undersampling factor is increased. This phantom will allow sampling trajectories to be optimised whilst ensuring they remain diagnostically useful, which in the case of VPI means they can still be used to assess velar function. In particular, future work may also include optional tissue contrast parameters (T1, T2) for each mask, and the phantom can be used to test more advanced dynamic imaging reconstruction techniques, such as across time kt-GRAPPA.11 Ultimately, a graphical user interface will be produced to allow the end user to enter and alter imaging parameters to allow a more interactive optimisation process.


The first iteration of phantom development has been completed and simulated speech MRI images of radial, spiral and Cartesian k-space sampling trajectories have been produced. By adding realistic MRI contrast, such a numerical phantom could become a ‘gold standard” for future comparative speech rtMRI studies.


No acknowledgement found.


[1] Scott AD, Wylezinska M, Birch MJ, Miquel ME (2014) Speech MRI: Morphology and Function. European Journal of Medical Physics – Medica Physica 6:604-18.

[2] Scott AD, Boubertakh R, Birch MJ, Miquel ME (2012) Towards clinical assessment of velopharyngeal closure using MRI: Evaluation of real-time MRI sequences at 1.5T and 3T. British Journal of Radiology 85:e1083-92.

[3] Lingala SG, Sutton BP, Miquel ME, Nayak KS (2016) Recommendations for real-time speech MRI. Journal of Magnetic Resonance Imaging 43: 28-44.

[4] Burdumy, M., Traser, L., Richter, B., Echternach, M., Korvink, J. G., Hennig, J. and Zaitsev, M. (2015), Acceleration of MRI of the vocal tract provides additional insight into articulator modifications. J. Magn. Reson. Imaging, 42: 925–935. doi:10.1002/jmri.24857

[5] Lingala, S. G., Zhu, Y., Kim, Y.-C., Toutios, A., Narayanan, S. and Nayak, K. S. (2017), A fast and flexible MRI system for the study of dynamic vocal tract shaping. Magn. Reson. Med., 77: 112–125. doi:10.1002/mrm.26090

[6] Freitas AC, Wylezinska M, Birch MJ, Petersen SE, Miquel ME (2016) Comparison of Cartesian and non-Cartesian real-time MRI sequences at 1.5T to assess velar motion and velopharyngeal closure during speech. PLOS ONE, DOI: http://dx.doi.org/10.1371/journal.pone.0153322

[7] Freitas AC, Ruthven M, Boubertakh R, Miquel ME (2017) Real-time speech MRI: commercial Cartesian and non-Cartesian sequences at 3T and feasibility of offline TGV reconstruction to visualise velopharyngeal motion. Physica Medica

[8] Wissmann L, Santelli C, Segars WP, Kozerke S (2014) MRXCAT: Realistic numerical phantoms for cardiovascular magnetic resonance. Journal of Cardiovascular Magnetic Resonance 16:63.

[9] Ruthven M, Freitas A, Keevil S, Miquel M. Real-time speech MRI: What is the optimal temporal resolution for clinical velopharyngeal closure assessment?. 2016;24:(3208.).

[10] Canny J. A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell. 1986(6):679-698.

[11] Huang, F., Akao, J., Vijayakumar, S., Duensing, G.R. and Limkeman, M., 2005. K-t GRAPPA: A k-space implementation for dynamic MRI with high reduction factor. Magnetic Resonance in Medicine, 54(5), pp.1172-1184.


Figure 1 MRI numerical speech phantom prototype software development framework: There are 5 steps (1-5) in the development process for the phantom (Stage 1) and 2 steps (6) in its testing and implementation (Stage 2).

Figure 2 Dynamic MR speech phantom images of velopharyngeal closure and opening: The images from left to right show subsequent dynamic images (30 fps) from a 2D MRI speech phantom showing a velopharyngeal closure and opening.

Figure 3 Simulated images of a speech MRI phantom produced using radial k-space sampling trajectories: The extreme left hand image shows the initial “true” image of the phantom and the images to its right are generated using radial sampling trajectories with increasing undersampling factors (indicated above each image). The normalised root mean square (RMS) error of pixel intensities is indicated below each image.

Figure 4 MRI speech phantom reconstructions using Spiral k-space sampling trajectories: The left hand image depicts an image of the phantom at a given timepoint, the centre image has a reconstruction of the same image using a spiral trajectory and the right hand image is produced using a spiral reconstruction with added Gaussian noise.

Proc. Intl. Soc. Mag. Reson. Med. 26 (2018)