R/sperrorest_misc.R
as.resampling.Rd
Create/coerce and print resampling objects, e.g., partitionings or bootstrap samples derived from a data set.
as.resampling(object, ...)
# Default S3 method
as.resampling(object, ...)
# S3 method for class 'factor'
as.resampling(object, ...)
# S3 method for class 'list'
as.resampling(object, ...)
validate.resampling(object)
is.resampling(x, ...)
# S3 method for class 'resampling'
print(x, ...)
as.resampling
methods: An object of class resampling
.
A resampling
object is a list of lists defining a set of training
and test samples.
In the case of k
-fold cross-validation partitioning, for example, the
corresponding resampling
object would be of length k
, i.e. contain k
lists. Each of these k
lists defines a training set of size n(k-1)/k
(where n
is the overall sample size), and a test set of size n/k
. The
resampling
object does, however, not contain the data itself, but only
indices between 1
and n
identifying the selection (see Examples).
Another example is bootstrap resampling. represampling_bootstrap with
argument oob = TRUE
generates rep
resampling
objects with indices of a
bootstrap sample in the train
component and indices of the out-of-bag
sample in the test component (see Examples below).
as.resampling.factor
: For each factor level of the input variable,
as.resampling.factor
determines the indices of samples in this level (=
test samples) and outside this level (= training samples). Empty levels of
object
are dropped without warning.
as.resampling_list
checks if the list in object
has a valid resampling
object structure (with components train
and test
etc.) and assigns the
class attribute 'resampling'
if successful.
# Muenchow et al. (2012), see ?ecuador
# Partitioning by elevation classes in 200 m steps:
parti <- factor(as.character(floor(ecuador$dem / 200)))
smp <- as.resampling(parti)
summary(smp)
#> n.train n.test
#> 10 600 151
#> 11 585 166
#> 12 660 91
#> 13 641 110
#> 14 727 24
#> 15 747 4
#> 8 730 21
#> 9 567 184
# Compare:
summary(parti)
#> 10 11 12 13 14 15 8 9
#> 151 166 91 110 24 4 21 184
# k-fold (non-spatial) cross-validation partitioning:
parti <- partition_cv(ecuador)
parti <- parti[[1]] # the first (and only) resampling object in parti
# data corresponding to the test sample of the first fold:
str(ecuador[parti[[1]]$test, ])
#> 'data.frame': 75 obs. of 13 variables:
#> $ x : num 715382 714892 714202 714862 714032 ...
#> $ y : num 9560142 9559312 9557412 9560982 9558502 ...
#> $ dem : num 2021 2380 2544 1863 2403 ...
#> $ slope : num 42 32.8 39.9 21 31.4 ...
#> $ hcurv : num 0.00958 -0.00266 -0.02104 -0.00295 0.01924 ...
#> $ vcurv : num 0.02642 0.02896 -0.02046 0.00035 0.02606 ...
#> $ carea : num 671 276 1024 2698 565 ...
#> $ cslope : num 41.6 20.9 38.8 23.5 26.3 ...
#> $ distroad : num 300 300 300 214 300 ...
#> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ...
#> $ distdeforest : num 300 300 300 20.2 300 ...
#> $ distslidespast: num 21 31 100 68 8 5 65 0 25 100 ...
#> $ log.carea : num 2.83 2.44 3.01 3.43 2.75 ...
# the corresponding training sample - larger:
str(ecuador[parti[[1]]$train, ])
#> 'data.frame': 676 obs. of 13 variables:
#> $ x : num 712882 715232 715392 715042 712802 ...
#> $ y : num 9560002 9559582 9560172 9559312 9559952 ...
#> $ dem : num 1912 2199 1989 2320 1838 ...
#> $ slope : num 25.6 23.2 40.5 42.9 52.1 ...
#> $ hcurv : num -0.00681 -0.00501 -0.01919 -0.01106 0.00183 ...
#> $ vcurv : num -0.00029 -0.00649 -0.04051 -0.04634 -0.09203 ...
#> $ carea : num 5577 1399 351155 501 634 ...
#> $ cslope : num 34.4 30.7 32.8 33.9 30.3 ...
#> $ distroad : num 300 300 300 300 300 ...
#> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ...
#> $ distdeforest : num 15 300 300 300 9.15 ...
#> $ distslidespast: num 9 21 40 100 2 100 100 41 5 20 ...
#> $ log.carea : num 3.75 3.15 5.55 2.7 2.8 ...
# Bootstrap training sets, out-of-bag test sets:
parti <- represampling_bootstrap(ecuador, oob = TRUE)
parti <- parti[[1]] # the first (and only) resampling object in parti
# out-of-bag test sample: approx. one-third of nrow(ecuador):
str(ecuador[parti[[1]]$test, ])
#> 'data.frame': 279 obs. of 13 variables:
#> $ x : num 712882 715232 715382 715272 714842 ...
#> $ y : num 9560002 9559582 9560142 9557702 9558892 ...
#> $ dem : num 1912 2199 2021 2813 2483 ...
#> $ slope : num 25.6 23.2 42 31 68.8 ...
#> $ hcurv : num -0.00681 -0.00501 0.00958 -0.00123 -0.04921 ...
#> $ vcurv : num -0.00029 -0.00649 0.02642 0.00393 -0.12438 ...
#> $ carea : num 5577 1399 671 2081 754 ...
#> $ cslope : num 34.4 30.7 41.6 37.6 53.7 ...
#> $ distroad : num 300 300 300 300 300 30 300 300 300 300 ...
#> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ...
#> $ distdeforest : num 15 300 300 300 300 ...
#> $ distslidespast: num 9 21 21 100 100 20 100 2 100 100 ...
#> $ log.carea : num 3.75 3.15 2.83 3.32 2.88 ...
# bootstrap training sample: same size as nrow(ecuador):
str(ecuador[parti[[1]]$train, ])
#> 'data.frame': 751 obs. of 13 variables:
#> $ x : num 713472 712762 713952 715832 713642 ...
#> $ y : num 9558462 9560962 9561282 9558112 9560602 ...
#> $ dem : num 2298 2022 1801 2824 1945 ...
#> $ slope : num 29.2 44.9 33.8 23.7 34.5 ...
#> $ hcurv : num 0.00156 -0.01259 0.00466 0.00944 0.00165 ...
#> $ vcurv : num -0.00116 -0.00281 -0.01776 0.00836 0.00875 ...
#> $ carea : num 1818 1178 1022 251 1165 ...
#> $ cslope : num 31.74 34.2 4.14 22.86 31.92 ...
#> $ distroad : num 300 300 42.9 300 300 ...
#> $ slides : Factor w/ 2 levels "FALSE","TRUE": 1 2 2 1 1 1 2 2 1 2 ...
#> $ distdeforest : num 300 129 0 300 0 ...
#> $ distslidespast: num 76 63 13 100 100 100 100 100 100 30 ...
#> $ log.carea : num 3.26 3.07 3.01 2.4 3.07 ...