Working with BIDS folders#
Author: Nicolas Legrand nicolas.legrand@cfin.au.dk
Show code cell source
%%capture
import sys
if 'google.colab' in sys.modules:
!pip install systole
Starting in version 0.2.3
, Systole provides tools to interact with large datasets of physiological recordings. The functionalities interface with folders that are structured following the BIDS standards and this is the format we recommend using if you are following this tutorial.
Following the BIDS specifications, physiological recordings, sometimes associated with behavioural tasks or neural recordings, are stored with a filename ending with *_physio.tsv.gz
and are always accompanied with sidecar a *_physio.json
file containing metadata like the recording modality or the sampling frequency. Accessing both the times series and its accompanying metadata will help Systole automate the preprocessing by finding the correct parameters for peaks detection and reports.
A valid BIDS folder should be structured like the following:
└─ BIDS/
├─ sub-0001/
│ └─ ses-session1/
│ └─ beh/
│ ├─ sub-0001_ses_session1_task-mytask_physio.tsv.gz
│ └─ sub-0001_ses_session1_task-mytask_physio.json
│
├─ sub-0002/
├─ sub-0003/
└─ ...
Here, we have physiological recordings associated with a behavioural task for n
participants in the folder.
Tip
We recommend using tools like BIDS validator to ensure that your folder complies with BIDS specification before trying to preprocess your data, or to use the editor.
Preprocessing#
The first step will be to preprocess the raw data and store the signal and peaks detection in a new derivative folder. During this step, we can also decide to create HTML reports for each participants, so we can visualize the signal quality and peaks detection.
Preprocessing the physiological data from one participant#
The :py:func:systole.reports
sub-module contains tools to directly interact with BIDS formatted folders, preprocess and save individual reports in a BIDS consistent way. Those functionalities are built on the top of the:py:func:systole.reports.subject_level_report
function. This function will simply take a signal as input and will save as output the preprocessed signal with peaks detection (_physio.tsv.gz
with the _physio.json
), an .html
reports adapted to the kind of signal that was provided, and a features.tsv
file containing heart rate or respiratory rate variability features.
For example, running the following code:
from systole import import_dataset1
from systole.reports import subject_level_report
ecg = import_dataset1(modalities=["ECG"]).ecg.to_numpy()
subject_level_report(
participant_id="participant_test",
pattern="task_test",
result_folder="./",
session="session_test",
ecg=ecg,
ecg_sfreq=1000,
)
will save these four new files in the file folder.
The
.html
file is a standalone document that can be visualized in the browser.The
features.tsv
contains heart rate and/or respiration rate variability metrics.The
_physio.tsv.gz
and the_physio.json
files contain the preprocessed signal with new columnspeaks
for peaks detection.
Preprocessing the entire BIDS folder#
The previous function call can be automated for each participant and each file of a given BIDS folder and to extract the physiological features using the information provided in the json
metadata automatically. This can be done using the:py:func:systole.reports.wrapper
function, or directly from the command line. For example, the following command:
systole --bids_folder="/path/to/BIDS/folder/" \
--patterns="task-mytask" \
--modality="beh" \
--n_jobs=10 \
--overwrite=True \
--html_reports==False
will preprocess the data for all participants with a physiological recording in the session ses-session1
(default), for the behavioural modality (beh
) and the task mytask
. We set n_jobs=10
, meaning that we will run 40 processes in parallel, and overwrite=True
to overwrite previous data with the same ID in the derivative folder. Note that we also set html_reports
to False
as these files can be quite large, it is often preferable to only create it for the participant we want to review, or to use the Manual edition of peaks vector and bad segments labelling. The possible arguments are:
Argument |
Description |
---|---|
–bids_folder (-i) |
Path to the BIDS folder containing the physiological recordings. |
–participant_id (-p) |
The id of the participant that should be preprocessed. If this argument is not provided, all the participants will be preprocessed. |
–patern (-t) |
Only the files that contains the pattern string will be preprocessed. If the number of files matching is not exactly 1, the files are not processed. |
–html_reports (-r) |
Create subject-level HTML reports if |
–result_folder (-o) |
Path to the result folder. If not provided, the default will be ./derivatives/systole/. |
–n_jobs (-n) |
The number of jobs to run concurrently. |
–modality (-d) |
The modality of the recording (i.e. |
–overwrite (-w) |
If |
Note
When setting overwrite=True
, only the preprocessed derivatives will be overwritten, but not the edited files located in BIDS/systole/derivatives/corrected/*
. This means that it is possible to re-run the preprocessing event after working on the manual artefacts edition (see below).
Once the preprocessing is completed, and if you did not asked for an external result folder, the structure of the BIDS repository should now include a new systole
folder in the derivatives:
└─ BIDS/
├─ derivatives/
│ └─ systole/
│ └─ sub-0001/
│ └─ ses-session1/
│ └─ beh/
│ ├─ sub-0001_ses_session1_task-mytask_features.tsv
│ ├─ sub-0001_ses_session1_task-mytask_report.html
│ ├─ sub-0001_ses_session1_task-mytask_physio.tsv.gz
│ └─ sub-0001_ses_session1_task-mytask_physio.json
├─ sub-0001/
│ └─ ses-session1/
│ └─ beh/
│ ├─ sub-0001_ses_session1_task-mytask_physio.tsv.gz
│ └─ sub-0001_ses_session1_task-mytask_physio.json
│
├─ sub-0002/
├─ sub-0003/
└─ ...
Manual edition of peaks vector and bad segments labelling#
While we hope that the peaks detection function used by Systole is sufficiently robust to extract peak vectors without errors for most of the uses cases, you might still encounter noisy or invalid recording that you will want to manually inspect and sometimes edit.
The :py:mod:systole.interact
sub-module provides two classes (:py:class:systole.interact.Editor
and :py:class:systole.interact.Viewer
) built on the top of Matplotlib widgets that can help for manual edition, and interactive visualization of BIDS fodlers directly in the notebook.
Using the Editor to inspect raw signal#
The :py:mod:systole.interact.Editor
can be use alone (apart from a BISD structured folder) to edit peaks detection from a raw ECG, PPG or respiratory signal.
from systole import import_dataset1
from systole.interact import Viewer, Editor
from IPython.display import display
%matplotlib ipympl
# Load a ray ECG time series
ecg = import_dataset1(modalities=['ECG'], disable=True).ecg.to_numpy()
editor = Editor(
signal=ecg,
sfreq=1000,
corrected_json="./corrected.json",
figsize=(15, 5),
signal_type="ECG"
)
display(editor.commands_box)
Note
Note that we are using the package ipympl, and activating it using the magic cell %matplotlib ipympl
so we can render Matplotlib interactive widgets in the Notebook. If you are working in another IDE, you can also render the same windows using a different backend like PyQt.
This windows will automatically apply peaks detection given the signal_type
parameter, and plot the raw signal with the instantaneous heart / respiration rate to check for artefacts. The class embed a command_box
that can be used for edition.
When using the Correction mode:
Use the left mouse button to select segment where all the peaks should be removed.
Use the right mouse button to select segment where peak will be added at the local maximum.
When using the Rejection mode:
Use the right mouse button to select a segment that should be marked as bad.
By deselecting the check box, you can mark the entire signal as invalid.
Once that the signal has been edited, you can save the modification using the
Save modification
button, or directly use the method from the class.
editor.save()
This function will create a JSON file (using the path specified in the corrected_json
parameter) with all the information about bad segments labelling, peaks deletion and peaks insertion. The JSON file contains the following entries for each modality (ECG, PPG and respiration)
valid
: is the recording valid or should it be discared (True
unless otherwise stated).corrected_peaks
: the peaks indexes after correction.bad_segments
: a list ofstart
andend
indexed of bad segments.
Importing signals after manual edition#
After manual peaks correction and segments labelling, a new corrected
subfolder will be appended to the systole derivatives:
└─ BIDS/
├─ derivatives/
│ └─ systole/
│ ├─ corrected/
│ └─ sub-0001/
│ └─ ses-session1/
│ └─ beh/
│ └─ sub-0001_ses_session1_task-mytask_physio.json
│ └─ sub-0001/
│ └─ ses-session1/
│ └─ beh/
│ ├─ sub-0001_ses_session1_task-mytask_features.tsv
│ ├─ sub-0001_ses_session1_task-mytask_report.html
│ ├─ sub-0001_ses_session1_task-mytask_physio.tsv.gz
│ └─ sub-0001_ses_session1_task-mytask_physio.json
├─ sub-0001/
│ └─ ses-session1/
│ └─ beh/
│ ├─ sub-0001_ses_session1_task-mytask_physio.tsv.gz
│ └─ sub-0001_ses_session1_task-mytask_physio.json
│
├─ sub-0002/
├─ sub-0003/
└─ ...