Usage Notes#

Command-Line Arguments#

Creates a cohort by grabbing specific subjects from opennneuro datasets.

usage: cohort_creator [-h] [-v] {browse,update,install,get,copy,all} ...

Positional Arguments#

command

Possible choices: browse, update, install, get, copy, all

Choose a subcommand

Named Arguments#

-v, --version

show program’s version number and exit

Sub-commands#

browse#

Launch a dash app in the browser to browse, visualize and filter the listing of known datasets. It will also create a dataset-results.tsv with the filtered list of datasets.

cohort_creator browse [-h] [--verbosity {0,1,2,3}] [--debug]
Named Arguments#
--verbosity

Possible choices: 0, 1, 2, 3

Verbosity level.

Default: 2

--debug

Runs the Dash app in debug mode.

Default: False

update#

Update listing of known BIDS datasets.

cohort_creator update [-h] [--debug] [--verbosity {0,1,2,3}]
Named Arguments#
--debug

Only runs the update for a few subset of datasets.

Default: False

--verbosity

Possible choices: 0, 1, 2, 3

Verbosity level.

Default: 2

install#

Install several openneuro datasets.

cohort_creator install [-h] -d DATASET_LISTING [DATASET_LISTING ...]
                       [-p PARTICIPANT_LISTING] [-o OUTPUT_DIR]
                       [--dataset_types {raw,mriqc,fmriprep} [{raw,mriqc,fmriprep} ...]]
                       [--verbosity {0,1,2,3}]
                       [--generate_participant_listing]
Named Arguments#
-d, --dataset_listing

Path to TSV file containing the list of datasets to get or a list of datasets to install (ds000001 ds000002).

-p, --participant_listing

Path to TSV file containing the list of participants to get. Optional. If not provided, all participants will be downloaded.

-o, --output_dir

Fullpath to the directory where the output files will be stored.

--dataset_types

Possible choices: raw, mriqc, fmriprep

Dataset to install and get data from.

Default: [‘raw’]

--verbosity

Possible choices: 0, 1, 2, 3

Verbosity level.

Default: 2

--generate_participant_listing

Generate a participant_listing.tsv in the output_dir.

Default: False

get#

Get specified data for a cohort of subjects.

cohort_creator get [-h] -d DATASET_LISTING [DATASET_LISTING ...]
                   [-p PARTICIPANT_LISTING] [-o OUTPUT_DIR]
                   [--dataset_types {raw,mriqc,fmriprep} [{raw,mriqc,fmriprep} ...]]
                   [--verbosity {0,1,2,3}]
                   [--datatypes {anat,func,fmap} [{anat,func,fmap} ...]]
                   [--space SPACE] [--task TASK]
                   [--bids_filter_file BIDS_FILTER_FILE] [--jobs JOBS]
Named Arguments#
-d, --dataset_listing

Path to TSV file containing the list of datasets to get or a list of datasets to install (ds000001 ds000002).

-p, --participant_listing

Path to TSV file containing the list of participants to get. Optional. If not provided, all participants will be downloaded.

-o, --output_dir

Fullpath to the directory where the output files will be stored.

--dataset_types

Possible choices: raw, mriqc, fmriprep

Dataset to install and get data from.

Default: [‘raw’]

--verbosity

Possible choices: 0, 1, 2, 3

Verbosity level.

Default: 2

--datatypes

Possible choices: anat, func, fmap

Datatype to get.

Default: [‘anat’]

--space

Space of the input data. Only applies when dataset_types requested includes fmriprep.

Default: “MNI152NLin2009cAsym”

--task

Task of the input data. Only applies when datatypes has task entity.

Default: “*”

--bids_filter_file

Path to a JSON file describing custom BIDS input filters. For further details, please check out the FAQ.

--jobs

Number of jobs: passed to datalad to speed up getting files.

Default: 6

copy#

Copy cohort of subjects into separate directory.

cohort_creator copy [-h] -d DATASET_LISTING [DATASET_LISTING ...]
                    [-p PARTICIPANT_LISTING] [-o OUTPUT_DIR]
                    [--dataset_types {raw,mriqc,fmriprep} [{raw,mriqc,fmriprep} ...]]
                    [--verbosity {0,1,2,3}]
                    [--datatypes {anat,func,fmap} [{anat,func,fmap} ...]]
                    [--space SPACE] [--task TASK]
                    [--bids_filter_file BIDS_FILTER_FILE] [--skip_group_mriqc]
Named Arguments#
-d, --dataset_listing

Path to TSV file containing the list of datasets to get or a list of datasets to install (ds000001 ds000002).

-p, --participant_listing

Path to TSV file containing the list of participants to get. Optional. If not provided, all participants will be downloaded.

-o, --output_dir

Fullpath to the directory where the output files will be stored.

--dataset_types

Possible choices: raw, mriqc, fmriprep

Dataset to install and get data from.

Default: [‘raw’]

--verbosity

Possible choices: 0, 1, 2, 3

Verbosity level.

Default: 2

--datatypes

Possible choices: anat, func, fmap

Datatype to get.

Default: [‘anat’]

--space

Space of the input data. Only applies when dataset_types requested includes fmriprep.

Default: “MNI152NLin2009cAsym”

--task

Task of the input data. Only applies when datatypes has task entity.

Default: “*”

--bids_filter_file

Path to a JSON file describing custom BIDS input filters. For further details, please check out the FAQ.

--skip_group_mriqc

Skips rerunning mriqc on the subset of participants.

Default: False

all#

Install, get, and copy cohort of subjects.

cohort_creator all [-h] -d DATASET_LISTING [DATASET_LISTING ...]
                   [-p PARTICIPANT_LISTING] [-o OUTPUT_DIR]
                   [--dataset_types {raw,mriqc,fmriprep} [{raw,mriqc,fmriprep} ...]]
                   [--verbosity {0,1,2,3}]
                   [--datatypes {anat,func,fmap} [{anat,func,fmap} ...]]
                   [--space SPACE] [--task TASK]
                   [--bids_filter_file BIDS_FILTER_FILE] [--jobs JOBS]
                   [--skip_group_mriqc]
Named Arguments#
-d, --dataset_listing

Path to TSV file containing the list of datasets to get or a list of datasets to install (ds000001 ds000002).

-p, --participant_listing

Path to TSV file containing the list of participants to get. Optional. If not provided, all participants will be downloaded.

-o, --output_dir

Fullpath to the directory where the output files will be stored.

--dataset_types

Possible choices: raw, mriqc, fmriprep

Dataset to install and get data from.

Default: [‘raw’]

--verbosity

Possible choices: 0, 1, 2, 3

Verbosity level.

Default: 2

--datatypes

Possible choices: anat, func, fmap

Datatype to get.

Default: [‘anat’]

--space

Space of the input data. Only applies when dataset_types requested includes fmriprep.

Default: “MNI152NLin2009cAsym”

--task

Task of the input data. Only applies when datatypes has task entity.

Default: “*”

--bids_filter_file

Path to a JSON file describing custom BIDS input filters. For further details, please check out the FAQ.

--jobs

Number of jobs: passed to datalad to speed up getting files.

Default: 6

--skip_group_mriqc

Skips rerunning mriqc on the subset of participants.

Default: False

For a more readable version of this help section, see the online doc.

You can use the cohort_creator browse command to create a dataset-results.tsv to use for the next steps.

install#

cohort_creator install \
      --dataset_listing inputs/dataset-results.tsv \
      --participant_listing inputs/participant-results.tsv \
      --output_dir outputs \
      --dataset_types raw mriqc fmriprep \
      --verbosity 3

If no --participant_listing is provided, a participants.tsv file will be generated in output_dir/code that contains all participants for all datasets in dataset_listing.

Datasets listing can be passed directly as a list of datasets:

cohort_creator install \
      --dataset_listing ds000001 ds000002 \
      --output_dir outputs \
      --dataset_types raw mriqc fmriprep \
      --verbosity 3

get#

cohort_creator get \
      --dataset_listing inputs/dataset-results.tsv \
      --participant_listing inputs/participant-results.tsv \
      --output_dir outputs \
      --dataset_types raw mriqc fmriprep \
      --datatype anat func \
      --space T1w MNI152NLin2009cAsym \
      --jobs 6 \
      --verbosity 3

copy#

cohort_creator copy \
      --dataset_listing inputs/dataset-results.tsv \
      --participant_listing inputs/participant-results.tsv \
      --output_dir outputs \
      --dataset_types raw mriqc fmriprep \
      --datatype anat func \
      --space T1w MNI152NLin2009cAsym \
      --verbosity 3

all#

cohort_creator all \
      --dataset_listing inputs/dataset-results.tsv \
      --participant_listing inputs/participant-results.tsv \
      --output_dir outputs \
      --dataset_types raw mriqc fmriprep \
      --datatype anat func \
      --space T1w MNI152NLin2009cAsym \
      --verbosity 3

Python API#

from cohort_creator.data.utils import filter_data
from cohort_creator.data.utils import known_datasets_df
from cohort_creator.data.utils import save_dataset_listing
from cohort_creator.data.utils import wrangle_data

filter_config = {"task": "back", "datatypes": ["func"]}
df = wrangle_data(known_datasets_df())
df = filter_data(df, config=filter_config)
save_dataset_listing(df)