BACKGROUND: Since its inception over twenty years ago, functional magnetic resonance imaging (fMRI) has been used in numerous studies probing neural underpinnings of human cognition. However, the between session variance of many tasks used in fMRI remains understudied. Such information is especially important in context of clinical applications. A test-retest dataset was acquired to validate fMRI tasks used in pre-surgical planning. In particular, five task-related fMRI time series (finger, foot and lip movement, overt verb generation, covert verb generation, overt word repetition, and landmark tasks) were used to investigate which protocols gave reliable single-subject results. Ten healthy participants in their fifties were scanned twice using an identical protocol 2-3 days apart. In addition to the fMRI sessions, high-angular resolution diffusion tensor MRI (DTI), and high-resolution 3D T1-weighted volume scans were acquired. FINDINGS: Reliability analyses of fMRI data showed that the motor and language tasks were reliable at the subject level while the landmark task was not, despite all paradigms showing expected activations at the group level. In addition, differences in reliability were found to be mostly related to the tasks themselves while task-by-motion interaction was the major confounding factor. CONCLUSIONS: Together, this dataset provides a unique opportunity to investigate the reliability of different fMRI tasks, as well as methods and algorithms used to analyze, de-noise and combine fMRI, DTI and structural T1-weighted volume data.