smash.optimize_control_info#

smash.optimize_control_info(model, mapping='uniform', optimizer=None, optimize_options=None, cost_options=None)[source]#

Information on the optimization control vector of Model.

Parameters:
modelModel

Primary data structure of the hydrological model smash.

mappingstr, default ‘uniform’

Type of mapping. Should be one of

  • 'uniform'

  • 'distributed'

  • 'multi-linear'

  • 'multi-polynomial'

Hint

See the Mapping section

optimizerstr or None, default None

Name of optimizer. Should be one of

  • 'sbs' ('uniform' mapping only)

  • 'lbfgsb' ('uniform', 'distributed', 'multi-linear' or 'multi-polynomial' mapping only)

Note

If not given, a default optimizer will be set depending on the optimization mapping:

  • mapping = 'uniform'; optimizer = 'sbs'

  • mapping = 'distributed', 'multi-linear', or 'multi-polynomial'; optimizer = 'lbfgsb'

Hint

See the Optimization Algorithm section

optimize_optionsdict[str, Any] or None, default None

Dictionary containing optimization options for fine-tuning the optimization process. See default_optimize_options to retrieve the default optimize options based on the mapping and optimizer.

parametersstr, list[str, …] or None, default None

Name of parameters to optimize. Should be one or a sequence of any key of:

>>> optimize_options = {
    "parameters": "cp",
}
>>> optimize_options = {
    "parameters": ["cp", "ct", "kexc", "llr"],
}

Note

If not given, all parameters in Model.rr_parameters will be optimized.

boundsdict[str, tuple[float, float]] or None, default None

Bounds on optimized parameters. A dictionary where the keys represent parameter names, and the values are pairs of (min, max) values (i.e., a list or tuple) with min lower than max. The keys must be included in parameters.

>>> optimize_options = {
    "bounds": {
        "cp": (1, 2000),
        "ct": (1, 1000),
        "kexc": (-10, 5)
        "llr": (1, 1000)
    },
}

Note

If not given, default bounds will be applied to each parameter. See Model.get_rr_parameters_bounds, Model.get_rr_initial_states_bounds

control_tfmstr or None, default None

Transformation method applied to the control vector. Only used with 'sbs' or 'lbfgsb' optimizer. Should be one of:

  • 'keep'

  • 'normalize'

  • 'sbs' ('sbs' optimizer only)

Note

If not given, a default control vector transformation will be set depending on the optimizer:

  • optimizer = 'sbs'; control_tfm = 'sbs'

  • optimizer = 'lbfgsb'; control_tfm = 'normalize'

descriptordict[str, list[str, …]] or None, default None

Descriptors linked to optimized parameters. A dictionary where the keys represent parameter names, and the values are list of descriptor names. The keys must be included in parameters.

>>> optimize_options = {
    "descriptor": {
        "cp": ["slope", "dd"],
        "ct": ["slope"],
        "kexc": ["slope", "dd"],
        "llr": ["dd"],
    },
}

Note

If not given, all descriptors will be used for each parameter. This option is only be used when mapping is 'multi-linear' or 'multi-polynomial'. In case of 'ann', all descriptors will be used.

termination_critdict[str, Any] or None, default None

Termination criteria. The elements are:

  • 'maxiter': The maximum number of iterations. Only used when optimizer is 'sbs' or 'lbfgsb'.

  • 'factr': An additional termination criterion based on cost values. Only used when optimizer is 'lbfgsb'.

  • 'pgtol': An additional termination criterion based on the projected gradient of the cost function. Only used when optimizer is 'lbfgsb'.

  • 'epochs': The number of training epochs for the neural network. Only used when mapping is 'ann'.

  • 'early_stopping': A positive number to stop training when the loss function does not decrease below the current optimal value for early_stopping consecutive epochs. When set to zero, early stopping is disabled, and the training continues for the full number of epochs. Only used when mapping is 'ann'.

>>> optimize_options = {
    "termination_crit": {
        "maxiter": 10,
        "factr": 1e6,
    },
}
>>> optimize_options = {
    "termination_crit": {
        "epochs": 200,
    },
}

Note

If not given, default values are set to each elements.

cost_optionsdict[str, Any] or None, default None

Dictionary containing computation cost options for simulated and observed responses. The elements are:

jobs_cmptstr or list[str, …], default ‘nse’

Type of observation objective function(s) to be computed. Should be one or a sequence of any of

  • 'nse', 'nnse', 'kge', 'mae', 'mape', 'mse', 'rmse', 'lgrm' (classical evaluation metrics)

  • 'Crc', 'Crchf', 'Crclf', 'Crch2r', 'Cfp2', 'Cfp10', 'Cfp50', 'Cfp90' (continuous signatures-based error metrics)

  • 'Eff', 'Ebf', 'Erc', 'Erchf', 'Erclf', 'Erch2r', 'Elt', 'Epf' (flood event signatures-based error metrics)

>>> cost_options = {
    "jobs_cmpt": "nse",
}
>>> cost_options = {
    "jobs_cmpt": ["nse", "Epf"],
}
wjobs_cmptstr or list[float, …], default ‘mean’

The corresponding weighting of observation objective functions in case of multi-criteria (i.e., a sequence of objective functions to compute). There are two ways to specify it:

  • An alias among 'mean'

  • A sequence of value whose size must be equal to the number of observation objective function(s) in jobs_cmpt

>>> cost_options = {
    "wjobs_cmpt": "mean",
}
>>> cost_options = {
    "wjobs_cmpt": [0.7, 0.3],
}
jobs_cmpt_tfmstr or list[str, …], default ‘keep’

Type of transformation applied to discharge in observation objective function(s). Should be one or a sequence of any of

  • 'keep' : No transformation \(f:x \rightarrow x\)

  • 'sqrt' : Square root transformation \(f:x \rightarrow \sqrt{x}\)

  • 'inv' : Multiplicative inverse transformation \(f:x \rightarrow \frac{1}{x}\)

>>> cost_options = {
    "jobs_cmpt_tfm": "inv",
}
>>> cost_options = {
    "jobs_cmpt_tfm": ["keep", "inv"],
}

Note

If jobs_cmpt is a multi-criteria and only one transformation is choosen in jobs_cmpt_tfm. The transformation will be applied to each observation objective function.

wjregfloat or str, default 0

The weighting of regularization term. There are two ways to specify it:

  • A value greater than or equal to 0

  • An alias among 'fast' or 'lcurve'. wjreg will be auto-computed by one of these methods.

>>> cost_options = {
    "wjreg": 1e-4,
}
>>> cost_options = {
    "wjreg": "lcurve",
}
jreg_cmptstr or list[str, …], default ‘prior’

Type(s) of regularization function(s) to be minimized when regularization term is set (i.e.,**wjreg** > 0). Should be one or a sequence of any of

  • 'prior'

  • 'smoothing'

  • 'hard-smoothing'

>>> cost_options = {
    "jreg_cmpt": "prior",
}
>>> cost_options = {
    "jreg_cmpt": ["prior", "smoothing"],
}

Hint

See the Regularization Function section

wjreg_cmptstr or list[float, …], default ‘mean’

The corresponding weighting of regularization functions in case of multi-regularization (i.e., a sequence of regularization functions to compute). There are two ways to specify it:

  • An alias among 'mean'

  • A sequence of value whose size must be equal to the number of regularization function(s) in jreg_cmpt

>>> cost_options = {
    "wjreg_cmpt": "mean",
}
>>> cost_options = {
    "wjreg_cmpt": [1., 2.],
}
end_warmupstr, pandas.Timestamp or None, default None

The end of the warm-up period, which must be between the start time and the end time defined in Model.setup.

>>> cost_options = {
    "end_warmup": "1997-12-21",
}
>>> cost_options = {
    "end_warmup": pd.Timestamp("19971221"),
}

Note

If not given, it is set to be equal to the Model.setup start time.

gaugestr or list[str, …], default ‘dws’

Type of gauge to be computed. There are two ways to specify it:

  • An alias among 'all' (all gauge codes) or 'dws' (most downstream gauge code(s))

  • A gauge code or any sequence of gauge codes. The gauge code(s) given must belong to the gauge codes defined in the Model.mesh

>>> cost_options = {
    "gauge": "dws",
}
>>> cost_options = {
    "gauge": "V3524010",
}
>>> cost_options = {
    "gauge": ["V3524010", "V3515010"],
}
wgaugestr or list[float, …] default ‘mean’

Type of gauge weights. There are two ways to specify it:

  • An alias among 'mean', 'lquartile' (1st quantile or lower quantile), 'median', or 'uquartile' (3rd quantile or upper quantile)

  • A sequence of value whose size must be equal to the number of gauges optimized in gauge

>>> cost_options = {
    "wgauge": "mean",
}
>>> cost_options = {
    "wgauge": [0.6, 0.4]",
}
event_segdict[str, float], default {‘peak_quant’: 0.995, ‘max_duration’: 240}

A dictionary of event segmentation options when calculating flood event signatures for cost computation (i.e., jobs_cmpt includes flood events signatures).

>>> cost_options = {
    event_seg = {
        "peak_quant": 0.998,
        "max_duration": 120,
    }
}

Hint

See the hydrograph_segmentation function and Hydrograph Segmentation section.

Returns:
control_infodict[str, Any]

A dictionary containing optimize control information of Model. The elements are:

  • nint

    The size of the control vector.

  • nbknumpy.ndarray

    An array of shape (4,) containing the number of elements by kind (Model.rr_parameters, Model.rr_initial_states, Model.serr_mu_parameters, Model.serr_sigma_parameters) of the control vector (sum(nbk) = n).

  • xnumpy.ndarray

    An array of shape (n,) containing the initial values of the control vector (it can be transformed).

  • lnumpy.ndarray

    An array of shape (n,) containing the lower bounds of the control vector (it can be transformed).

  • unumpy.ndarray

    An array of shape (n,) containing the upper bounds of the control vector (it can be transformed).

  • nbdnumpy.ndarray

    An array of shape (n,) containing the type of bounds of the control vector. The values are:

    • 0: unbounded

    • 1: only lower bound

    • 2: both lower and upper bounds

    • 3: only upper bound

  • namenumpy.ndarray

    An array of shape (n,) containing the names of the control vector. The naming convention is:

    • <key>0: Spatially uniform parameter or multi-linear/polynomial intercept where <key> is the name of any rainfall-runoff parameters or initial_states ('cp0', 'llr0', 'ht0', etc).

    • <key><row>-<col>: Spatially distributed parameter where <key> is the name of any rainfall-runoff parameters or initial_states and <row>, <col>, the corresponding position in the spatial domain ('cp1-1', 'llr20-2', 'ht3-12', etc). It’s one based indexing.

    • <key>-<desc>-<kind>: Multi-linear/polynomial descriptor linked parameter where <key> is the name of any rainfall-runoff parameters or initial_states, <desc> the corresponding descriptor and <kind>, the kind of parameter (coefficient or exposant) ('cp-slope-a', 'llr-slope-b', 'ht-dd-a').

  • x_bkgnumpy.ndarray

    An array of shape (n,) containing the background values of the control vector.

  • l_bkgnumpy.ndarray

    An array of shape (n,) containing the background lower bounds of the control vector.

  • u_bkgnumpy.ndarray

    An array of shape (n,) containing the background upper bounds of the control vector.

Examples

>>> from smash.factory import load_dataset
>>> setup, mesh = load_dataset("cance")
>>> model = smash.Model(setup, mesh)

Default optimize control vector information

>>> control_info = smash.optimize_control_info(model)
>>> control_info
{
    'l': array([-13.815511 , -13.815511 ,  -4.6052704, -13.815511 ], dtype=float32),
    'l_bkg': array([ 1.e-06,  1.e-06, -5.e+01,  1.e-06], dtype=float32),
    'n': 4,
    'name': array(['cp0', 'ct0', 'kexc0', 'llr0'], dtype='<U5'),
    'nbd': array([2, 2, 2, 2], dtype=int32),
    'nbk': array([4, 0, 0, 0], dtype=int32),
    'u': array([6.9077554, 6.9077554, 4.6052704, 6.9077554], dtype=float32),
    'u_bkg': array([1000., 1000.,   50., 1000.], dtype=float32),
    'x': array([5.2983174, 6.214608 , 0.       , 1.609438 ], dtype=float32),
    'x_bkg': array([200., 500.,   0.,   5.], dtype=float32),
}

This gives a direct indication of what the optimizer takes as input, depending on the optimization configuration set up. 4 rainfall-runoff parameters are uniformly optimized ('cp0', 'ct0', 'kexc0' and 'llr0'). Each parameter has a lower and upper bound (2 in nbd) and a transformation was applied to the control (x relative to x_bkg)

With a customize optimize configuration. Here, choosing a multi-linear mapping and optimizing only cp and kexc with different descriptors

>>> control_info = smash.optimize_control_info(
        model,
        mapping="multi-linear",
        optimize_options={
            "parameters": ["cp", "kexc"],
            "descriptor": {"kexc": ["dd"]},
        },
    )
>>> control_info
{
    'l': array([-99., -99., -99., -99., -99.], dtype=float32),
    'l_bkg': array([-99., -99., -99., -99., -99.], dtype=float32),
    'n': 5,
    'name': array(['cp0', 'cp-slope-a', 'cp-dd-a', 'kexc0', 'kexc-dd-a'], dtype='<U10'),
    'nbd': array([0, 0, 0, 0, 0], dtype=int32),
    'nbk': array([5, 0, 0, 0], dtype=int32),
    'u': array([-99., -99., -99., -99., -99.], dtype=float32),
    'u_bkg': array([-99., -99., -99., -99., -99.], dtype=float32),
    'x': array([-1.3862944,  0.       ,  0.       ,  0.       ,  0.       ], dtype=float32),
    'x_bkg': array([-1.3862944,  0.       ,  0.       ,  0.       ,  0.       ], dtype=float32),
}

5 parameters are optimized which are the intercepts ('cp0' and 'kexc0') and the coefficients ('cp-slope-a', 'cp-dd-a' and 'kexc-dd-a') of the regression between the descriptors (slope and dd) and the rainfall-runoff parameters (cp and kexc)