Add distance information to resampling objects
resampling or represampling object.
Additional arguments to dataset_distance and add.distance.resampling, respectively.
data.frame
containing at least the columns specified by
coords
(ignored by partition_cv
)
Use future.apply::future_lapply()
for parallelized
execution if mode = "future"
, and lapply
for sequential
execution otherwise (mode = "sequential"
)
A resampling or represampling object containing an additional.
$distance
component in each resampling object. The distance
component
is a single numeric value indicating, for each train
/ test
pair, the
(by default, mean) nearest-neighbour distance between the two sets.
Nearest-neighbour distances are calculated for each sample in the
test set. These nrow(???$test)
nearest-neighbour distances are then
averaged. Aggregation methods other than mean
can be chosen using the
fun
argument, which will be passed on to dataset_distance.
# Muenchow et al. (2012), see ?ecuador
nsp.parti <- partition_cv(ecuador)
sp.parti <- partition_kmeans(ecuador)
nsp.parti <- add.distance(nsp.parti, data = ecuador)
sp.parti <- add.distance(sp.parti, data = ecuador)
# non-spatial partioning: very small test-training distance:
nsp.parti[[1]][[1]]$distance
#> [1] 48.65723
# spatial partitioning: more substantial distance, depending on number of
# folds etc.
sp.parti[[1]][[1]]$distance
#> [1] 405.307