datman.utils module
A collection of utilities for generally munging imaging data.
- class datman.utils.cd(path)[source]
Bases:
object
A context manager for changing directory. Since best practices dictate returning to the original directory, saves the original directory and returns to it after the block has exited.
May raise OSError if the given path doesn’t exist (or the current directory is deleted before switching back)
- datman.utils.check_dependency_configured(program_name, shell_cmd=None, env_vars=None)[source]
- <program_name> Name to add to the exception message if the program is
not correctly configured.
- <shell_cmd> A command line command that will be put into ‘which’,
to check whether the shell can find it.
- <env_vars> A list of shell variables that are expected to be set.
Doesnt verify the value of these vars, only that they are all set.
Raises EnvironmentError if the command is not findable or if any environment variable isnt configured.
- datman.utils.define_folder(path)[source]
Sets a variable to be the path to a folder. Also, if the folder does not exist, this makes it so, unless we lack the permissions to do so, which leads to a graceful exit.
- datman.utils.filter_niftis(candidates)[source]
Takes a list and returns all items that contain the extensions ‘.nii’ or ‘.nii.gz’.
- datman.utils.get_all_headers_in_folder(path, recurse=False)[source]
Get DICOM headers for all files in the given path.
Returns a dictionary mapping path->headers for all files (headers == None for files that are not dicoms).
- datman.utils.get_archive_headers(path, stop_after_first=False)[source]
Get dicom headers from a scan archive.
Path can be a path to a tarball or zip of dicom folders, or a folder. It is assumed that this archive contains the dicoms from a single exam, organized into folders for each series.
The entire archive is scanned and dicom headers from a single file in each folder are returned as a dictionary that maps path->headers.
If stop_after_first == True only a single set of dicom headers are returned for the entire archive, which is useful if you only care about the exam details.
- datman.utils.get_extension(path)[source]
Get the filename extension on this path.
This is a slightly more sophisticated version of os.path.splitext in that this will correctly return the extension for ‘.tar.gz’ files, for example. :D
- datman.utils.get_files_with_tag(parentdir, tag, fuzzy=False)[source]
Returns a list of files that have the specified tag.
Filenames must conform to the datman naming convention (see scanid.parse_filename) in order to be considered.
If fuzzy == True, then filenames are matched if the given tag is found within the filename’s tag.
- datman.utils.get_folder_headers(path, stop_after_first=False)[source]
Generate a dictionary of subfolders and dicom headers.
- datman.utils.get_loaded_modules()[source]
Returns a space separated list of loaded modules
These are modules loaded by the environment-modules system. This function just looks in the LOADEDMODULES environment variable for the list.
- datman.utils.get_subject_metadata(config=None, study=None, allow_partial=False)[source]
Returns all QC’d session IDs mapped to any blacklisted scans they have
This will collect and organize all checklist and blacklist data for a study. Sessions that do not have a completed checklist entry will have their blacklist entries ommitted from the output unless the ‘allow_partial’ flag is used. This is done so that partially QC’d subjects do not accidentally get processed by downstream pipelines.
Either a study name or a datman config object must be supplied to find the checklist and blacklist contents.
- Parameters
config (
datman.config.config
, optional) – A datman config object with the study set to the study of interest.study (
str
, optional) – A datman study nameallow_partial (bool, optional) – Whether to include blacklist entries if the subject has not been fully QC’d (i.e. if they dont have a completed checklist entry yet). Defaults to False.
- Returns
A dictionary with any QC’d subject ID mapped to a list of blacklisted scan names that have been mangled to drop the series description and the file extension.
- Return type
- datman.utils.get_tarfile_headers(path, stop_after_first=False)[source]
Get headers for dicom files within a tarball
- datman.utils.get_zipfile_headers(path, stop_after_first=False)[source]
Get headers for a dicom file within a zipfile
- datman.utils.makedirs(path)[source]
Make the directory (including parent directories) if they don’t exist
- datman.utils.nifti_basename(fpath)[source]
return basename without extension (either .nii.gz or .nii)
- datman.utils.read_blacklist(study=None, scan=None, subject=None, config=None, path=None, bids_ses=None, use_bids=False)[source]
This function is used to look up blacklisted scans. If the dashboard is found it ONLY checks the dashboard database. Otherwise it expects a datman style ‘blacklist’ file on the filesystem.
- This function can accept:
A study name (nickname, not study tag)
A datman scan name (may include the full path and extension)
A BIDS scan name (set the use_bids option to ‘True’)
A subject ID
A datman config object, initialized to the study being worked with
- A full path directly to a blacklist file. If given, this will
circumvent any dashboard database checks and ignore any datman config files.
- Returns
A dictionary of scan names mapped to the comment provided when they were blacklisted (Note: If reading from the filesystem, commas contained in comments will be removed)
OR a dictionary of the same format containing only entries for a single subject if a specific subject ID was given
OR the comment for a specific scan if a scan is given
OR ‘None’ if a scan is given but not found in the blacklist
- datman.utils.read_checklist(study=None, subject=None, config=None, path=None, bids_id=None, bids_ses=None, use_bids=False)[source]
This function is used to look-up QC checklist entries. If the dashboard is found it will ONLY check the dashboard database, otherwise it expects a datman style ‘checklist’ file on the filesystem.
- This function can accept either:
A study name (nickname, not the study tag) or subject ID (Including a session number and may use BIDS ID instead of datman ID)
A datman config object, initialized to the study being worked with
A full path directly to a checklist file (Will circumvent the dashboard database check and ignore any datman config files)
Set use_bids=True to return an entire study’s checklist organized by BIDS name instead of datman name. This option only works with dashboard integration.
- Returns
A dictionary of subject IDs mapped to their comment / name of the person who signed off on their data
OR the comment for a specific subject if a subject ID is given
OR ‘None’ if a specific subject ID is given and they’re not found in the list
- datman.utils.run(cmd, dryrun=False, specialquote=True, verbose=True)[source]
Runs the command in default shell, returning STDOUT and a return code. The return code uses the python convention of 0 for success, non-zero for failure
- datman.utils.run_dummy_q(list_of_names)[source]
This holds the script until all of the queued items are done.
- datman.utils.split_path(path)[source]
Splits a path into all the component parts, returns a list
>>> split_path('a/b/c/d.txt') ['a', 'b', 'c', 'd.txt']
- datman.utils.splitext(path)[source]
Function that will remove extension, including specially-defined extensions that fool os.path.splitext
- datman.utils.submit_job(cmd, job_name, log_dir, system='other', cpu_cores=1, walltime='2:00:00', dryrun=False, partition=None, argslist='', workdir='/tmp')[source]
submits a job or joblist the queue depending on the system
- Parameters
submit (cmd Command or list of commands to) –
job (job_name The name for the) –
go (log_dir Path to where the job logs should) –
to (system Current system running (similar) – DM_SYSTEM) [default=other]
for (cpu_cores Number of cores to allocate) – job [default=1]
[default=2 (walltime Real clock time for job) – 00:00]
not (dryrun Set to true if you want job to) – submit [default=False]
the (partition Slurm partition. If none specified) – default queue will be used
(etc (argslist String of additional slurm arguments) – –mem X –verbose …) [default=None]
work (workdir Location for slurm to use as the) – dir [default=’/tmp’]
- datman.utils.update_checklist(entries, study=None, config=None, path=None)[source]
Add or update QC checklist entries.
This will preferentially update the dashboard database (ignoring any ‘checklist.csv’ files) unless the dashboard is not installed or a specific path is given to a file.
- Parameters
entries (
dict
) – A dictionary of subject IDs mapped to the comment to add / update. Use an empty string to indicate sign off.study (
str
, optional) – the name of the study the subject(s) belong to.config (
datman.config.config
, optional) – the configuration for the study the subject(s) belong to.path (
str
) – The full path to the checklist.csv file to update. Passing the full path will cause the dashboard database to be by-passed even if the QC dashboard is installed.
- Raises
MetadataException – When update fails for any subject in ‘entries’.
- datman.utils.validate_subject_id(subject_id, config)[source]
Ensures subject ID correctness based on configuration settings.
- This checks that a given ID:
Matches a supported naming convention
Matches a study tag that’s defined in the configuration file for the current study
Matches a site that is defined for the study tag
- Parameters
subject_id (
str
) – A subject ID to check.config (
datman.config.config
) – A datman config instance that has been initialized to the study the subject ID should belong to.
- Raises
ParseException – When an ID is given that does not match any supported convention or that contains incorrect fields for the current study.
- Returns
- A parsed datman identifier matching
subject_id
- Return type