Xi Peng^{1,2}, Fan Lam^{1}, Yudu Li^{1,3}, Bryan Clifford^{1,3}, Brad Sutton^{1,4}, and Zhi-Pei Liang^{1,3}

Regularization is widely used for solving ill-posed image reconstruction problems and an appropriate selection of the regularization parameter is critical in ensuring high-quality reconstructions. While many methods have been proposed to address this problem, selecting a regularization parameter for optimal performance (under a specific metric) in a computationally efficient manner is still an open problem. We propose here a novel deep learning based method for regularization parameter selection. Specifically, a convolutional neural network is designed to predict the optimal parameter from an “arbitrary” initial parameter choice. The proposed method has been evaluated using experimental data, demonstrating its capability to learn the optimal parameter for two different L_{1}-regularized reconstruction problems.

A key assumption of the proposed method is that given a regularized reconstruction formulation and a specific performance metric $$$g$$$, the nonlinear function $$$g(\lambda)$$$ is usually smooth and has a simple shape, and that the collection of all $$$g(\lambda)$$$ for different datasets should have a simple topology in a certain feature space (e.g., a smooth hypersurface illustrated in Fig. 1). Therefore, it is possible to learn this nonlinear function from prior training data.

To achieve this, we propose a deep learning based method to capture such nonlinear relationship and predict the “oracle” $$$\lambda_0$$$ (corresponding to a reconstruction with optimal performance metric value) based on an initial guess of $$$\lambda$$$ (e.g., $$$\lambda’$$$). Specifically, we designed a convolutional neural network (CNN) to take two "input" images, $$$X_1$$$ and $$$X_2$$$, and output a distance measure between $$$\lambda_0$$$ and $$$\lambda’$$$ (illustrated in Fig. 2). $$$X_1$$$ is an initial reconstruction using $$$\lambda’$$$ (serving as a "reference" point) and $$$X_2$$$ is reconstructed with $$$\lambda=0$$$, representing a data consistent reconstruction. CNN is selected because it is a powerful feature extraction tool, particularly for the purpose of capturing the key features evaluated by a metric. These features are then used for determining the distance between $$$\lambda’$$$ and $$$\lambda_0$$$, such that the latter one can be predicted for new data.

It is important to note that instead of directly outputting $$$\lambda_0$$$, which can be significantly different for different datasets, we design the CNN to output a distance measure $$$d(\lambda, \lambda_0)$$$ which is insensitive to scale differences, simplifying the learning problem. Specifically, we chose $$$d(\lambda, \lambda_0)=log(\lambda/\lambda_0)$$$. To illustrate this, consider an example where $$$\lambda$$$ has a large dynamic range (e.g., [$$$10^{-1}$$$,$$$10^{-5}$$$]), and $$$\lambda’$$$ is two orders of magnitude different than $$$\lambda$$$, the log-ratio measure can shrink the difference to a reasonable numerical range (e.g., [-2, 2] with $$$\lambda_0=10^{-3}$$$). We believe this will make the learning method more stable and be equally sensitive to any initial $$$\lambda’$$$.

T_{1}-weighted images from five different
subjects were acquired on a 3T MR scanner (SIEMENS Prisma) using a 3D-FLASH
sequence with the same parameters (matrix size=256$$$\times$$$256, spatial
resolution=0.9mm$$$\times$$$0.9mm, slice thickness= 2mm, FA=90°, TR=20ms,
TE=4.45ms). Reconstructions of each slice with different $$$\lambda'$$$ were used as different training samples.

Two
CNNs were trained for two different L_{1}-regularized reconstruction methods, i.e.,
basic compressed sensing^{11} (CS) and L1-SPIRiT^{12}, respectively. For the CS reconstruction,
the optimal $$$\lambda$$$’s were selected based on high-frequency error norm (HFEN) and the data
were retrospectively undersampled by $$$\times$$$2 using a 1D variable-density
pattern, while for the L1-SPIRiT the optimal $$$\lambda$$$’s were selected
using mean squared error (MSE) and a $$$\times$$$4 variable-density Poisson disk undersampling pattern,
to demonstrate the network’s capability to learn the nonlinear relationships
for different metrics.

- Craven P, Wahba G. Smoothing noisy data with spline functions. Numer Math 1979;31:377–403.
- Carew JD, Wahba G, Xie X, Nordheim EV, Meyerandb ME. Optimal spline smoothing of fMRI time series by generalized crossvalidation. Neuroimage 2003;18:950–961.
- Sourbron S, Luypaert R, Schuerbeek PV, Dujardin M, Stadnik T. Choice of the regularization parameter for perfusion quantification with MRI. Phys Med Biol 2004;49:3307–3324.
- Stein C. Estimation of the mean of a multivariate normal distribution. Ann Statist 1981;9:1135–1151.
- Ramani S, Liu Z, Rosen J, Nielsen J-F, Fessler JA. Regularization parameter selection for nonlinear iterative image restoration and MRI reconstruction using GCV and SURE-based methods. IEEE transactions on image processing 2012;21:3659-3672.
- Weller DS, Ramani S, Nielsen J-F, Fessler JA. Monte carlo SURE-based parameter selection for parallel magnetic resonance imaging reconstruction. Magn Reson Med 2014;71:1760–1770.
- Ramani S, Weller DS, Nielsen J-F, Fessler JA. Non-cartesian MRI reconstruction with automatic regularization via monte-carlo SURE. IEEE transactions on medical imaging 2013;32:1411-1422.
- Karl WC. Regularization in image restoration and reconstruction. in Handbook of Image Video Processing, A. Bovik, Ed. New York: Elsevier,2005, pp. 183–202.
- Hansen PC, O’Leary DP. The use of the L-curve in the regularization of discrete ILL-posed problems. SIAM J Sci Comput 1993;14:1487–1503.
- Vogel CR. Non-convergence of the L-curve regularization parameter selection method. Inverse Problems 1996;12:535–547.
- Lustig M, Donoho D, Pauly JM. Sparse MRI: the application of compressed sensing for rapid MR imaging. Magn Reson Med 2007;58:1182–1195.
- Lustig M, Pauly JM. SPIRiT: iterative self-consistent parallel
imagingreconstruction from arbitrary k-space. Magn Reson Med 2010;64:457–471.

Figure 1. An illustration of the smoothness of g(λ). In this case, the HFEN metric was calculated for different values of λ and different datasets. As can be seen, g(λ) is smooth along the parameter dimension and the shape is similar across datasets implying a simple topology in a certain feature space.

Figure 2. An illustration of the proposed deep learning method for selecting the optimal regularization parameter. X_{1} is a regularized reconstruction with an arbitrary initial parameter λ' and X_{2} denotes the data consistent reconstruction (λ=0). The output is a distance measure between λ' and the "oracle" parameter λ. CNN is trained to learn such a relationship so the desired λ for a future dataset can be predicted.

Figure 3. Learning regularization parameters for the L1-SPIRiT reconstruction at a reduction factor of 4 with MSE as the quality metric. The black squares (in the plot) indicate all the possible values of initial λ', the red circles denote those predicted by our method, and the green spots represent the "oracle" λ_{0}. Reconstructions using different λ's are shown on the right.

Figure 4. Learning regularization parameters for the CS reconstruction at a reduction factor of 2 with HFEN as the quality metric. The black squares indicate all the possible values of initial λ', the red circles denote those predicted by our method, and the green spots represent the "oracle" λ_{0}. Reconstructions using different λ's are shown on the right.