R/sperrorest_resampling.R
represampling_factor_bootstrap.Rd
represampling_factor_bootstrap
resamples partitions defined by
a factor variable. This can be used for non-overlapping block bootstraps
and similar.
represampling_factor_bootstrap(
data,
fac,
repetition = 1,
nboot = -1,
seed1 = NULL,
oob = FALSE
)
data.frame
containing at least the columns specified by
coords
defines a grouping or partitioning of the samples in data
; three
possible types: (1) the name of a variable in data
(coerced to factor if
not already a factor variable); (2) a factor variable (or a vector that can
be coerced to factor); (3) a list of factor variables (or vectors that can
be coerced to factor); this list must be of length length(repetition)
,
and if it is named, the names must be equal to as.character(repetition)
;
this list will typically be generated by a partition.*
function with
return_factor = TRUE
(see Examples below)
numeric vector: cross-validation repetitions to be
generated. Note that this is not the number of repetitions, but the indices
of these repetitions. E.g., use repetition = c(1:100)
to obtain (the
'first') 100 repetitions, and repetition = c(101:200)
to obtain a
different set of 100 repetitions.
number of bootstrap replications used for generating the
bootstrap training sample (nboot[1]
) and the test sample (nboot[2]
);
nboot[2]
is ignored (with a warning) if oob = TRUE
. A value of -1
will be substituted with the number of levels of the factor variable,
corresponding to an n out of n bootstrap at the grouping level defined
by fac
.
seed1+i
is the random seed that will be used by set.seed in
repetition i
(i
in repetition
) to initialize the random number
generator before sampling from the data set.
if TRUE
, the test sample will be the out-of-bag sample; if
FALSE
(default), the test sample is an independently drawn bootstrap
sample of size nboot[2]
.
nboot
refers to the number of groups (as defined by the factors)
to be drawn with replacement from the set of groups. I.e., if fac
is a
factor variable, nboot
would normally not be greater than nlevels(fac)
,
nlevels(fac)
being the default as per nboot = -1
.
data(ecuador)
# a dummy example for demonstration, performing bootstrap
# at the level of an arbitrary factor variable:
parti <- represampling_factor_bootstrap(ecuador,
factor(floor(ecuador$dem / 100)),
oob = TRUE
)
# plot(parti,ecuador)
# using the factor bootstrap for a non-overlapping block bootstrap
# (see also represampling_tile_bootstrap):
fac <- partition_tiles(ecuador,
return_factor = TRUE, repetition = c(1:3),
dsplit = 500, min_n = 200, rotation = "random",
offset = "random"
)
parti <- represampling_factor_bootstrap(ecuador, fac,
oob = TRUE,
repetition = c(1:3)
)
# plot(parti, ecuador)