Create/coerce and print resampling objects, e.g., partitionings or boostrap samples derived from a data set.

as.resampling(object, ...)

# S3 method for default
as.resampling(object, ...)

# S3 method for factor
as.resampling(object, ...)

# S3 method for list
as.resampling(object, ...)

validate.resampling(object)

is.resampling(x, ...)

# S3 method for resampling
print(x, ...)

Arguments

object

depending on the function/method, a list or a vector of type factor defining a partitioning of the dataset.

...

currently not used.

x

object of class resampling.

Value

as.resampling methods: An object of class resampling.

Details

A resampling object is a list of lists defining a set of training and test samples.

In the case of k-fold cross-validation partitioning, for example, the corresponding resampling object would be of length k, i.e. contain k lists. Each of these k lists defines a training set of size n(k-1)/k (where n is the overall sample size), and a test set of size n/k. The resampling object does, however, not contain the data itself, but only indices between 1 and n identifying the selection (see Examples).

Another example is bootstrap resampling. represampling_bootstrap with argument oob = TRUE generates represampling objects with indices of a bootstrap sample in the train component and indices of the out-of-bag sample in the test component (see Examples below).

as.resampling.factor: For each factor level of the input variable, as.resampling.factor determines the indices of samples in this level (= test samples) and outside this level (= training samples). Empty levels of object are dropped without warning.

as.resampling_list checks if the list in object has a valid resampling object structure (with components train and test etc.) and assigns the class attribute 'resampling' if successful.

See also

Examples

# Muenchow et al. (2012), see ?ecuador # Partitioning by elevation classes in 200 m steps: parti <- factor(as.character(floor(ecuador$dem / 200))) smp <- as.resampling(parti) summary(smp)
#> n.train n.test #> 10 600 151 #> 11 585 166 #> 12 660 91 #> 13 641 110 #> 14 727 24 #> 15 747 4 #> 8 730 21 #> 9 567 184
# Compare: summary(parti)
#> 10 11 12 13 14 15 8 9 #> 151 166 91 110 24 4 21 184
# k-fold (non-spatial) cross-validation partitioning: parti <- partition_cv(ecuador) parti <- parti[[1]] # the first (and only) resampling object in parti # data corresponding to the test sample of the first fold: str(ecuador[parti[[1]]$test, ])
#> 'data.frame': 75 obs. of 13 variables: #> $ x : num 715382 714892 714202 714862 714032 ... #> $ y : num 9560142 9559312 9557412 9560982 9558502 ... #> $ dem : num 2021 2380 2544 1863 2403 ... #> $ slope : num 42 32.8 39.9 21 31.4 ... #> $ hcurv : num 0.00958 -0.00266 -0.02104 -0.00295 0.01924 ... #> $ vcurv : num 0.02642 0.02896 -0.02046 0.00035 0.02606 ... #> $ carea : num 671 276 1024 2698 565 ... #> $ cslope : num 41.6 20.9 38.8 23.5 26.3 ... #> $ distroad : num 300 300 300 214 300 ... #> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... #> $ distdeforest : num 300 300 300 20.2 300 ... #> $ distslidespast: num 21 31 100 68 8 5 65 0 25 100 ... #> $ log.carea : num 2.83 2.44 3.01 3.43 2.75 ...
# the corresponding training sample - larger: str(ecuador[parti[[1]]$train, ])
#> 'data.frame': 676 obs. of 13 variables: #> $ x : num 712882 715232 715392 715042 712802 ... #> $ y : num 9560002 9559582 9560172 9559312 9559952 ... #> $ dem : num 1912 2199 1989 2320 1838 ... #> $ slope : num 25.6 23.2 40.5 42.9 52.1 ... #> $ hcurv : num -0.00681 -0.00501 -0.01919 -0.01106 0.00183 ... #> $ vcurv : num -0.00029 -0.00649 -0.04051 -0.04634 -0.09203 ... #> $ carea : num 5577 1399 351155 501 634 ... #> $ cslope : num 34.4 30.7 32.8 33.9 30.3 ... #> $ distroad : num 300 300 300 300 300 ... #> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... #> $ distdeforest : num 15 300 300 300 9.15 ... #> $ distslidespast: num 9 21 40 100 2 100 100 41 5 20 ... #> $ log.carea : num 3.75 3.15 5.55 2.7 2.8 ...
# Bootstrap training sets, out-of-bag test sets: parti <- represampling_bootstrap(ecuador, oob = TRUE) parti <- parti[[1]] # the first (and only) resampling object in parti # out-of-bag test sample: approx. one-third of nrow(ecuador): str(ecuador[parti[[1]]$test, ])
#> 'data.frame': 279 obs. of 13 variables: #> $ x : num 712882 715232 715382 715272 714842 ... #> $ y : num 9560002 9559582 9560142 9557702 9558892 ... #> $ dem : num 1912 2199 2021 2813 2483 ... #> $ slope : num 25.6 23.2 42 31 68.8 ... #> $ hcurv : num -0.00681 -0.00501 0.00958 -0.00123 -0.04921 ... #> $ vcurv : num -0.00029 -0.00649 0.02642 0.00393 -0.12438 ... #> $ carea : num 5577 1399 671 2081 754 ... #> $ cslope : num 34.4 30.7 41.6 37.6 53.7 ... #> $ distroad : num 300 300 300 300 300 30 300 300 300 300 ... #> $ slides : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ... #> $ distdeforest : num 15 300 300 300 300 ... #> $ distslidespast: num 9 21 21 100 100 20 100 2 100 100 ... #> $ log.carea : num 3.75 3.15 2.83 3.32 2.88 ...
# bootstrap training sample: same size as nrow(ecuador): str(ecuador[parti[[1]]$train, ])
#> 'data.frame': 751 obs. of 13 variables: #> $ x : num 713472 712762 713952 715832 713642 ... #> $ y : num 9558462 9560962 9561282 9558112 9560602 ... #> $ dem : num 2298 2022 1801 2824 1945 ... #> $ slope : num 29.2 44.9 33.8 23.7 34.5 ... #> $ hcurv : num 0.00156 -0.01259 0.00466 0.00944 0.00165 ... #> $ vcurv : num -0.00116 -0.00281 -0.01776 0.00836 0.00875 ... #> $ carea : num 1818 1178 1022 251 1165 ... #> $ cslope : num 31.74 34.2 4.14 22.86 31.92 ... #> $ distroad : num 300 300 42.9 300 300 ... #> $ slides : Factor w/ 2 levels "FALSE","TRUE": 1 2 2 1 1 1 2 2 1 2 ... #> $ distdeforest : num 300 129 0 300 0 ... #> $ distslidespast: num 76 63 13 100 100 100 100 100 100 30 ... #> $ log.carea : num 3.26 3.07 3.01 2.4 3.07 ...