R/sperrorest_resampling.R
partition_tiles.Rdpartition_tiles divides the study area into a specified number
of rectangular tiles. Optionally small partitions can be merged with
adjacent tiles to achieve a minimum number or percentage of samples in each
tile.
data.frame containing at least the columns specified by
coords
vector of length 2 defining the variables in data that
contain the x and y coordinates of sample locations
optional vector of length 2: equidistance of splits in
(possibly rotated) x direction (dsplit[1]) and y direction (dsplit[2])
used to define tiles. If dsplit is of length 1, its value is recycled.
Either dsplit or nsplit must be specified.
optional vector of length 2: number of splits in (possibly
rotated) x direction (nsplit[1]) and y direction (nsplit[2]) used to
define tiles. If nsplit is of length 1, its value is recycled.
indicates whether and how the rectangular grid should be
rotated; random rotation is only between -45 and +45 degrees.
if rotation='user', angles (in degrees) by which the
rectangular grid is to be rotated in each repetition. Either a vector of
same length as repetition, or a single number that will be replicated
length(repetition) times.
indicates whether and how the rectangular grid should be shifted by an offset.
if offset='user', a list (or vector) of two components
specifying a shift of the rectangular grid in (possibly rotated) x and y
direction. The offset values are relative values, a value of 0.5
resulting in a one-half tile shift towards the left, or upward. If this is
a list, its first (second) component refers to the rotated x (y) direction,
and both components must have same length as repetition (or length 1). If
a vector of length 2 (or list components have length 1), the two values
will be interpreted as relative shifts in (rotated) x and y direction,
respectively, and will therefore be recycled as needed
(length(repetition) times each).
logical (default TRUE): if TRUE, 'small' tiles (as per
min_frac and min_n arguments and get_small_tiles) are merged with
(smallest) adjacent tiles. If FALSE, small tiles are 'eliminated', i.e.
set to NA.
numeric >=0, <1: minimum relative size of partition as
percentage of sample; argument passed to get_small_tiles. Will be ignored
if NULL.
integer >=0: minimum number of samples per partition; argument
passed to get_small_tiles. Will be ignored if NULL.
argument to be passed to tile_neighbors
if FALSE (default), return a represampling object;
if TRUE (used internally by other sperrorest functions), return a
list containing factor vectors (see Value)
numeric vector: cross-validation repetitions to be
generated. Note that this is not the number of repetitions, but the indices
of these repetitions. E.g., use repetition = c(1:100) to obtain (the
'first') 100 repetitions, and repetition = c(101:200) to obtain a
different set of 100 repetitions.
seed1+i is the random seed that will be used by set.seed in
repetition i (i in repetition) to initialize the random number
generator before sampling from the data set.
A represampling object. Contains length(repetition) resampling
objects as repetitions. The exact number of folds / test-set tiles within
each resampling objects depends on the spatial configuration of the data
set and possible cleaning steps (see min_frac, min_n).
Default parameter settings may change in future releases. This
function, especially the rotation and shifting part of it and the algorithm
for cleaning up small tiles is still a bit experimental. Use with caution.
For non-zero offsets (offset!='none')), the number of tiles may actually
be greater than nsplit[1]*nsplit[2] because of fractional tiles lurking
into the study region. reassign=TRUE with suitable thresholds is
therefore recommended for non-zero (including random) offsets.
data(ecuador)
set.seed(42)
parti <- partition_tiles(ecuador, nsplit = c(4, 3), reassign = FALSE)
# plot(parti,ecuador)
# tile A4 has only 55 samples
# same partitioning, but now merge tiles with less than 100 samples to
# adjacent tiles:
parti2 <- partition_tiles(ecuador,
nsplit = c(4, 3), reassign = TRUE,
min_n = 100
)
# plot(parti2,ecuador)
summary(parti2)
#> $`1`
#> n.train n.test
#> X1:Y3 600 151
#> X2:Y2 626 125
#> X3:Y1 584 167
#> X3:Y2 574 177
#> X3:Y3 620 131
#>
# tile B4 (in 'parti') was smaller than A3, therefore A4 was merged with B4,
# not with A3
# now with random rotation and offset, and tiles of 2000 m length:
parti3 <- partition_tiles(ecuador,
dsplit = 2000, offset = "random",
rotation = "random", reassign = TRUE, min_n = 100
)
# plot(parti3, ecuador)
summary(parti3)
#> $`1`
#> n.train n.test
#> X2:Y2 452 299
#> X3:Y1 562 189
#> X3:Y2 488 263
#>