R/sperrorest_resampling.R
partition_tiles.Rd
partition_tiles
divides the study area into a specified number
of rectangular tiles. Optionally small partitions can be merged with
adjacent tiles to achieve a minimum number or percentage of samples in each
tile.
data.frame
containing at least the columns specified by
coords
vector of length 2 defining the variables in data
that
contain the x and y coordinates of sample locations
optional vector of length 2: equidistance of splits in
(possibly rotated) x direction (dsplit[1]
) and y direction (dsplit[2]
)
used to define tiles. If dsplit
is of length 1, its value is recycled.
Either dsplit
or nsplit
must be specified.
optional vector of length 2: number of splits in (possibly
rotated) x direction (nsplit[1]
) and y direction (nsplit[2]
) used to
define tiles. If nsplit
is of length 1, its value is recycled.
indicates whether and how the rectangular grid should be
rotated; random rotation is only between -45
and +45
degrees.
if rotation='user'
, angles (in degrees) by which the
rectangular grid is to be rotated in each repetition. Either a vector of
same length as repetition
, or a single number that will be replicated
length(repetition)
times.
indicates whether and how the rectangular grid should be shifted by an offset.
if offset='user'
, a list (or vector) of two components
specifying a shift of the rectangular grid in (possibly rotated) x and y
direction. The offset values are relative values, a value of 0.5
resulting in a one-half tile shift towards the left, or upward. If this is
a list, its first (second) component refers to the rotated x (y) direction,
and both components must have same length as repetition
(or length 1). If
a vector of length 2 (or list components have length 1), the two values
will be interpreted as relative shifts in (rotated) x and y direction,
respectively, and will therefore be recycled as needed
(length(repetition)
times each).
logical (default TRUE
): if TRUE
, 'small' tiles (as per
min_frac
and min_n
arguments and get_small_tiles) are merged with
(smallest) adjacent tiles. If FALSE
, small tiles are 'eliminated', i.e.
set to NA
.
numeric >=0, <1: minimum relative size of partition as
percentage of sample; argument passed to get_small_tiles. Will be ignored
if NULL
.
integer >=0: minimum number of samples per partition; argument
passed to get_small_tiles. Will be ignored if NULL
.
argument to be passed to tile_neighbors
if FALSE
(default), return a represampling object;
if TRUE
(used internally by other sperrorest functions), return a
list
containing factor vectors (see Value)
numeric vector: cross-validation repetitions to be
generated. Note that this is not the number of repetitions, but the indices
of these repetitions. E.g., use repetition = c(1:100)
to obtain (the
'first') 100 repetitions, and repetition = c(101:200)
to obtain a
different set of 100 repetitions.
seed1+i
is the random seed that will be used by set.seed in
repetition i
(i
in repetition
) to initialize the random number
generator before sampling from the data set.
A represampling object. Contains length(repetition)
resampling
objects as repetitions. The exact number of folds / test-set tiles within
each resampling objects depends on the spatial configuration of the data
set and possible cleaning steps (see min_frac
, min_n
).
Default parameter settings may change in future releases. This
function, especially the rotation and shifting part of it and the algorithm
for cleaning up small tiles is still a bit experimental. Use with caution.
For non-zero offsets (offset!='none')
), the number of tiles may actually
be greater than nsplit[1]*nsplit[2]
because of fractional tiles lurking
into the study region. reassign=TRUE
with suitable thresholds is
therefore recommended for non-zero (including random) offsets.
data(ecuador)
set.seed(42)
parti <- partition_tiles(ecuador, nsplit = c(4, 3), reassign = FALSE)
# plot(parti,ecuador)
# tile A4 has only 55 samples
# same partitioning, but now merge tiles with less than 100 samples to
# adjacent tiles:
parti2 <- partition_tiles(ecuador,
nsplit = c(4, 3), reassign = TRUE,
min_n = 100
)
# plot(parti2,ecuador)
summary(parti2)
#> $`1`
#> n.train n.test
#> X1:Y3 600 151
#> X2:Y2 626 125
#> X3:Y1 584 167
#> X3:Y2 574 177
#> X3:Y3 620 131
#>
# tile B4 (in 'parti') was smaller than A3, therefore A4 was merged with B4,
# not with A3
# now with random rotation and offset, and tiles of 2000 m length:
parti3 <- partition_tiles(ecuador,
dsplit = 2000, offset = "random",
rotation = "random", reassign = TRUE, min_n = 100
)
# plot(parti3, ecuador)
summary(parti3)
#> $`1`
#> n.train n.test
#> X2:Y2 452 299
#> X3:Y1 562 189
#> X3:Y2 488 263
#>