partition_factor_cv creates a represampling object, i.e. a set of sample indices defining cross-validation test and training sets, where partitions are obtained by resampling at the level of groups of observations as defined by a given factor variable. This can be used, for example, to resample agricultural data that is grouped by fields, at the agricultural field level in order to preserve spatial autocorrelation within fields.

partition_factor_cv(
  data,
  coords = c("x", "y"),
  fac,
  nfold = 10,
  repetition = 1,
  seed1 = NULL,
  return_factor = FALSE
)

Arguments

data

data.frame containing at least the columns specified by coords

coords

vector of length 2 defining the variables in data that contain the x and y coordinates of sample locations.

fac

either the name of a variable (column) in data, or a vector of type factor and length nrow(data) that defines groups or clusters of observations.

nfold

number of partitions (folds) in nfold-fold cross-validation partitioning

repetition

numeric vector: cross-validation repetitions to be generated. Note that this is not the number of repetitions, but the indices of these repetitions. E.g., use repetition = c(1:100) to obtain (the 'first') 100 repetitions, and repetition = c(101:200) to obtain a different set of 100 repetitions.

seed1

seed1+i is the random seed that will be used by set.seed in repetition i (i in repetition) to initialize the random number generator before sampling from the data set.

return_factor

if FALSE (default), return a represampling object; if TRUE (used internally by other sperrorest functions), return a list containing factor vectors (see Value)

Value

A represampling object, see also partition_cv for details.

Note

In this partitioning approach, the number of factor levels in fac must be large enough for this factor-level resampling to make sense.