Combining Multi-Site/Study MRI Data: A Novel Linked-ICA Denoising Method for Removing Scanner and Site Variability from Multi-Modal MRI Data
Huanjie Li1,2, Staci Gruber1, Stephen M Smith3, Scott E Lukas1, Marisa Silveri1, Kevin P Hill4, William D. S Killgore5, and Lisa D Nickerson1

1Imaging Center, Harvard Medical School, McLean Hospital, Belmont, MA, United States, 2Dalian University of Technology, Dalian, China, 3Oxford University, Oxford, United Kingdom, 4Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, United States, 5University of Arizona, Tucson, AZ, United States


Large multi-site studies that pool magnetic resonance imaging (MRI) data across research sites present exceptional opportunities to advance neuroscience and enhance reproducibility of neuroimaging research. However, inconsistent MRI data collection platforms and scanning sequences both introduce systematic variability that can confound the true effect of interest and make the interpretation of results obtained from combined data difficult. Unfortunately, methods to address this problem are scant. In this study, we propose a novel denoising approach for multi-site, multi-modal MRI data that implements a data-driven linked independent component analysis to efficiently identify scanner/site-related effects for removal.


Large multi-site studies that pool magnetic resonance imaging (MRI) data across research sites present exceptional opportunities to advance neuroscience and enhance reproducibility of neuroimaging research1,2. Currently, there are more than a dozen ongoing large-scale multi-site neuroimaging studies (e.g., NIH-funded studies and UK Biobank). The strength of these large-scale studies lies in combining multi-site data to create large datasets that overcome limitations of small neuroimaging studies. However, both scanner and site variability are confounds that hinder pooling data collected across different sites or across different scanner software on the same hardware, even when all acquisition protocols are harmonized3,4. These confounds degrade statistical analyses, leading to incorrect or spurious findings. Unfortunately, methods to address this problem are scant. We propose a novel denoising approach for multi-site, multi-modal MRI data that implements a data-driven linked independent component analysis (LICA)5,6 to efficiently identify scanner/site-related confounds for removal. Removing these confounds results in denoised data that can be combined across studies to improve modality-specific statistical processing.


Data: Data from 133 subjects (62 chronic heavy marijuana smokers and 71 healthy controls (HC)) from 6 different studies were used. All data were collected using the same Siemens 3T Trio, but with 3 different scanner software versions (SSWV), VA23A, VA25A and VB17A. VA23A and VA25A were used prior to a major hardware and software upgrade of the Trio (TIM upgrade), while VB17A was used post-TIM. Acquisition sequences also differed across the studies, thus, the main confounds for combining data were SSWV and STUDY variability.

Data processing: Modality-specific preprocessing pipelines were used to produce outcome images for each participant, including: modulated grey matter (GM) images generated by FSL-VBM and vertex-wise cortical thickness (CT) and pial surface area (PSA) maps estimated by FreeSurfer, fractional anisotropy (FA), mean diffusivity (MD) and tensor mode (MO) images calculated using FSL FDT, and brain activation maps estimated by FSL FEAT analysis of functional MRI (fMRI) data collected during the Multi Source Interference Task. For each modality, a “subject” series was created by normalizing all images to MNI152 space, then concatenating across all participants into a single data file.

Denoising: Subject-series for all 7 modalities were analyzed simultaneously using LICA to derive 15 multi-modal spatial components. Subject-loadings (SL) for each component were assessed for relationships with SSWV, STUDY and participant variables using linear regression; those with SL that related only with SSWV and STUDY were identified for denoising. Two approaches for LICA-denoising were tested: LICA-R1, which applies a single multivariate regression (MVR) of the SL for all noise components against the participant-series for each modality to remove the noise effects, and LICA-R2, which uses a two-stage MVR to remove noise components by regressing the LICA spatial maps against each subject-series to obtain subject-specific regression weights that are then regressed against the subject-series to remove the noise effects. We compared the performance of LICA-R1/R2 with two other approaches for addressing scanner confounds when combining MRI data across studies/sites: a higher-level GLM with a site/study covariate (SSC-GLM) included in the group-level model, and modality-specific ICA denoising based on FSL MELODIC7. While all data were used to conduct the LICA to identify noise components, we constructed test data for each modality by splitting the data from HC into two “groups”, defined based on SSWV and STUDY variables. Thus any observed differences when comparing the two groups can be attributed to differences introduced by SSWV or STUDY. Group differences in each modality were assessed before and after denoising, using two-group t-tests with non-parametric permutation testing in FSL’s Randomise with 5000 permutations to achieve a significance level of p < 0.05, corrected for family-wise error.


Three noise components identified from 15 LICA components were used for LICA-R1/R2 denoising. The first revealed global effects in FA and MD and region-specific effects in GM, fMRI, CT and PSA (Fig. 1). The second revealed region-specific effects in FA, MD, GM, CT and PSA, while the third revealed effects in GM. Comparison of LICA-R1/R2 with SSC-GLM and modality-specific ICA based on the denoising performance on GM, fMRI and CT data (Figs. 2, 3 and 4), shows that SSC-GLM and ICA-based denoising were modestly effective at removing confounds. LICA-R1 showed superior performance over all methods in denoising scanner effects, removing them completely for each modality.

Discussion and Conclusion

A new method for denoising is proposed for removing site/scanner effects from multi-site/study MRI data. The proposed method (LICA-R1) is superior compared to existing strategies we tested and has great potential for large-scale multi-site studies to produce combined data free from study/site confounds.


This project was funded by NIDA R01 DA037265.


1. Van Horn JD, Toga AW. Human Neuroimaging as a “Big Data” Science. Brain Imaging Behav. 2014; 8: 323–331.

2. Varoquaux G. Cross-validation failure: Small sample sizes lead to large error bars. NeuroImage. 2017; PMID: 28655633.

3. Jovicich J, Marizzoni M, Sala-Llonch R, et al. Brain morphometry reproducibility in multi-center 3T MRI studies: A comparison of cross-sectional and longitudinal segmentations. Neuroimage. Elsevier Inc. 2013; 83: 472-484.

4. Venkatraman VK, Gonzalez CE, Landman B, et al. Region of interest correction factors improve reliability of diffusion imaging measures within and across scanners and field strengths. NeuroImage. 2015; 119: 406-416.

5. Groves AR, Beckmann CF, Smith SM, et al. Linked independent component analysis for multimodal data fusion. NeuroImage. 2011; 54: 2198-2217.

6. Groves AR, Smith SM, Fjell AM, et al. Benefits of multi-modal fusion analysis on a large-scale dataset: life-span patterns of inter-subject variability in cortical morphometry and white matter microstructure. NeuroImage. 2012; 63: 365-380.

7. Chen J, Liu J, Calhoun VD, et al. Exploration of scanning effects in multi-site structural MRI studies. J Neurosci. Methods. 2014; 230: 37-50.


Fig. 1. One multi-modal noise component which corresponds to global effects in FA and MD and region-specific effects in GM, fMRI, CT, and PSA. Subject loadings for this component were strongly associated with both SSWV (p =1.91e-08) and STUDY (p = 1.88e-04) variables. All the subjects (both marijuana and HC) were used to do LICA to derive 15 spatial components. Three components were identified as noise components, which only related with SSWV or STUDY variables. The spatial maps were thresholded at z = 2.3. The red-yellow color means the increasing change with the change of SSWV and STUDY variables.

Fig. 2. Group-level analysis of GM maps before and after data denoising. Data were constructed from two HC group based on SSWV. G1 contains 26 subjects (pre-Tim), G2 contains 42 subjects (post-Tim). The first row shows the results of the group comparison without any data de-noising or regression (i.e., original GM data). The second row shows the group comparison obtained with SSWV-STUDY regression. The third row shows the group comparison after modality-specific ICA denoising of GM data. The bottom two rows show the results from the group comparison using data denoised with LICA-R1/R2 methods.

Fig. 3. Group-level analysis of task fMRI data before and after data denoising. HC fMRI data were collected in only 2 studies (using different acquisition parameters), thus, two HC groups were constructed based on STUDY variable, with G1=16 subjects, G2=30 subjects. The data analysis strategy and result of each denoising method are quite consistent with the GM data results. It also shows that LICA-R1 denoising method achieved the best performance for task activation maps.

Fig. 4. Group-level analysis of CT data before and after denoising. Two groups of HC data were constructed based on SSWV. G1 contains 29 subjects with data acquired pre-Tim upgrade; G2 contains 42 subjects with data acquired post-Tim upgrade. Results are consistent with GM and fMRI results: denoising via GLM regression removes some noise, but does not clean data completely. Modality-specific ICA and LICA-R1 denoising methods achieve the best performance on cleaning noise from the CT data.

Proc. Intl. Soc. Mag. Reson. Med. 26 (2018)