Stickleback.Rd
Define a Stickleback model, used for automated detection of behavioral events in bio-logging data.
Stickleback(tsc, win_size, tol, nth = 1, n_folds = 4, seed = NULL)
[py:sktime.base.BaseEstimator]
A time series classifier created
with either compose_tsc
or create_tsc
.
[integer(1)]
Sliding window size in number of observations.
E.g., for 10 Hz data and a 5 s sliding window, win_size
should be 50.
[numeric(1)]
Prediction tolerance, in seconds. See
sb_assess
for details.
[integer(1)]
Sliding window step size. For example, when nth
=
1, the time series classifier (tsc
) will make predictions on every
window. When nth
= 2, tsc
predictions are only generated for every
other window. Higher nth
values reduce the time to fit a Stickleback
model and generate predictions, at the potential cost of reduced prediction
accuracy.
[integer(1)]
Number of folds for internal cross validation.
n_folds
must be at least 2. Larger n_folds
values increase model
fitting time, but may have greater out-of-sample accuracy.
[integer(1)]
Random number seed for model reproducibility.
CURRENTLY NOT WORKING (see issue #6).
There are two challenges facing automated behavioral event detection in bio-logging data. First, bio-logging data are time series and most classification algorithms have poor performance on time series. Second, bio-logging data resolution greatly exceeds the frequency of many biological rates, creating an imbalanced class problem. For example, bio-logging data collected from baleen whales is often standardized at 10 Hz, but feeding rates are approximately 200-500 events per day. Therefore, the "behavioral event" class is on the order of 1000s times smaller than the "non-event" class.
Stickleback addresses these challenges in a two-stage process. First, it uses
classification algorithms specifically designed for time series data by
interfacing with the sktime Python
package. Second, it under-samples the majority class ("non-events") when
training the classifier, then optimizes event prediction using internal
cross-validation. See vignette(rstickleback)
for more details.
local_clf
[py:sktime.base.BaseEstimator]
A time series classifier,
inheriting from sktime's BaseEstimator.
win_size
[integer(1)]
Sliding window size.
tol
[numeric(1)]
Prediction tolerance, in seconds.
nth
[integer(1)]
Sliding window step size.
n_folds
[integer(1)]
Number of folds for global cross validation
step.
seed
[integer(1)]
Random number seed.
.stickleback
[py:Stickleback]
Python Stickleback object.
# Load sample data
c(lunge_sensors, lunge_events) %<-% load_lunges()
# Define a time series classifier
tsc <- compose_tsc(module = "interval_based",
algorithm = "SupervisedTimeSeriesForest",
params = list(n_estimators = 2L, random_state = 4321L),
columns = columns(lunge_sensors))
# Define a Stickleback model
sb <- Stickleback(tsc,
win_size = 50,
tol = 5,
nth = 10,
n_folds = 4,
seed = 1234)