Create/coerce and print resampling objects, e.g., partitionings or bootstrap samples derived from a data set.

as.resampling(object, ...)

# Default S3 method
as.resampling(object, ...)

# S3 method for class 'factor'
as.resampling(object, ...)

# S3 method for class 'list'
as.resampling(object, ...)

validate.resampling(object)

is.resampling(x, ...)

# S3 method for class 'resampling'
print(x, ...)

Arguments

object

depending on the function/method, a list or a vector of type factor defining a partitioning of the dataset.

...

currently not used.

x

object of class resampling.

Value

as.resampling methods: An object of class resampling.

Details

A resampling object is a list of lists defining a set of training and test samples.

In the case of k-fold cross-validation partitioning, for example, the corresponding resampling object would be of length k, i.e. contain k lists. Each of these k lists defines a training set of size n(k-1)/k (where n is the overall sample size), and a test set of size n/k. The resampling object does, however, not contain the data itself, but only indices between 1 and n identifying the selection (see Examples).

Another example is bootstrap resampling. represampling_bootstrap with argument oob = TRUE generates represampling objects with indices of a bootstrap sample in the train component and indices of the out-of-bag sample in the test component (see Examples below).

as.resampling.factor: For each factor level of the input variable, as.resampling.factor determines the indices of samples in this level (= test samples) and outside this level (= training samples). Empty levels of object are dropped without warning.

as.resampling_list checks if the list in object has a valid resampling object structure (with components train and test etc.) and assigns the class attribute 'resampling' if successful.

Examples

# Muenchow et al. (2012), see ?ecuador

# Partitioning by elevation classes in 200 m steps:
parti <- factor(as.character(floor(ecuador$dem / 200)))
smp <- as.resampling(parti)
summary(smp)
#>    n.train n.test
#> 10     600    151
#> 11     585    166
#> 12     660     91
#> 13     641    110
#> 14     727     24
#> 15     747      4
#> 8      730     21
#> 9      567    184
# Compare:
summary(parti)
#>  10  11  12  13  14  15   8   9 
#> 151 166  91 110  24   4  21 184 

# k-fold (non-spatial) cross-validation partitioning:
parti <- partition_cv(ecuador)
parti <- parti[[1]] # the first (and only) resampling object in parti
# data corresponding to the test sample of the first fold:
str(ecuador[parti[[1]]$test, ])
#> 'data.frame':	75 obs. of  13 variables:
#>  $ x             : num  715382 714892 714202 714862 714032 ...
#>  $ y             : num  9560142 9559312 9557412 9560982 9558502 ...
#>  $ dem           : num  2021 2380 2544 1863 2403 ...
#>  $ slope         : num  42 32.8 39.9 21 31.4 ...
#>  $ hcurv         : num  0.00958 -0.00266 -0.02104 -0.00295 0.01924 ...
#>  $ vcurv         : num  0.02642 0.02896 -0.02046 0.00035 0.02606 ...
#>  $ carea         : num  671 276 1024 2698 565 ...
#>  $ cslope        : num  41.6 20.9 38.8 23.5 26.3 ...
#>  $ distroad      : num  300 300 300 214 300 ...
#>  $ slides        : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ...
#>  $ distdeforest  : num  300 300 300 20.2 300 ...
#>  $ distslidespast: num  21 31 100 68 8 5 65 0 25 100 ...
#>  $ log.carea     : num  2.83 2.44 3.01 3.43 2.75 ...
# the corresponding training sample - larger:
str(ecuador[parti[[1]]$train, ])
#> 'data.frame':	676 obs. of  13 variables:
#>  $ x             : num  712882 715232 715392 715042 712802 ...
#>  $ y             : num  9560002 9559582 9560172 9559312 9559952 ...
#>  $ dem           : num  1912 2199 1989 2320 1838 ...
#>  $ slope         : num  25.6 23.2 40.5 42.9 52.1 ...
#>  $ hcurv         : num  -0.00681 -0.00501 -0.01919 -0.01106 0.00183 ...
#>  $ vcurv         : num  -0.00029 -0.00649 -0.04051 -0.04634 -0.09203 ...
#>  $ carea         : num  5577 1399 351155 501 634 ...
#>  $ cslope        : num  34.4 30.7 32.8 33.9 30.3 ...
#>  $ distroad      : num  300 300 300 300 300 ...
#>  $ slides        : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ...
#>  $ distdeforest  : num  15 300 300 300 9.15 ...
#>  $ distslidespast: num  9 21 40 100 2 100 100 41 5 20 ...
#>  $ log.carea     : num  3.75 3.15 5.55 2.7 2.8 ...

# Bootstrap training sets, out-of-bag test sets:
parti <- represampling_bootstrap(ecuador, oob = TRUE)
parti <- parti[[1]] # the first (and only) resampling object in parti
# out-of-bag test sample: approx. one-third of nrow(ecuador):
str(ecuador[parti[[1]]$test, ])
#> 'data.frame':	279 obs. of  13 variables:
#>  $ x             : num  712882 715232 715382 715272 714842 ...
#>  $ y             : num  9560002 9559582 9560142 9557702 9558892 ...
#>  $ dem           : num  1912 2199 2021 2813 2483 ...
#>  $ slope         : num  25.6 23.2 42 31 68.8 ...
#>  $ hcurv         : num  -0.00681 -0.00501 0.00958 -0.00123 -0.04921 ...
#>  $ vcurv         : num  -0.00029 -0.00649 0.02642 0.00393 -0.12438 ...
#>  $ carea         : num  5577 1399 671 2081 754 ...
#>  $ cslope        : num  34.4 30.7 41.6 37.6 53.7 ...
#>  $ distroad      : num  300 300 300 300 300 30 300 300 300 300 ...
#>  $ slides        : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ...
#>  $ distdeforest  : num  15 300 300 300 300 ...
#>  $ distslidespast: num  9 21 21 100 100 20 100 2 100 100 ...
#>  $ log.carea     : num  3.75 3.15 2.83 3.32 2.88 ...
# bootstrap training sample: same size as nrow(ecuador):
str(ecuador[parti[[1]]$train, ])
#> 'data.frame':	751 obs. of  13 variables:
#>  $ x             : num  713472 712762 713952 715832 713642 ...
#>  $ y             : num  9558462 9560962 9561282 9558112 9560602 ...
#>  $ dem           : num  2298 2022 1801 2824 1945 ...
#>  $ slope         : num  29.2 44.9 33.8 23.7 34.5 ...
#>  $ hcurv         : num  0.00156 -0.01259 0.00466 0.00944 0.00165 ...
#>  $ vcurv         : num  -0.00116 -0.00281 -0.01776 0.00836 0.00875 ...
#>  $ carea         : num  1818 1178 1022 251 1165 ...
#>  $ cslope        : num  31.74 34.2 4.14 22.86 31.92 ...
#>  $ distroad      : num  300 300 42.9 300 300 ...
#>  $ slides        : Factor w/ 2 levels "FALSE","TRUE": 1 2 2 1 1 1 2 2 1 2 ...
#>  $ distdeforest  : num  300 129 0 300 0 ...
#>  $ distslidespast: num  76 63 13 100 100 100 100 100 100 30 ...
#>  $ log.carea     : num  3.26 3.07 3.01 2.4 3.07 ...