.. _user_guide.classical_uses.lez_regionalization:

=====================
Lez - Regionalization
=====================

This guide on `smash` will be carried out on the French catchment, **the Lez at Lattes** and aims to perform an optimization of the
hydrological model considering regional mapping of physical descriptors onto model conceptual parameters.
The parameters :math:`\boldsymbol{\theta}` can be written as a mapping :math:`\phi` of descriptors :math:`\boldsymbol{\mathcal{D}}`
(slope, drainage density, soil water storage, etc) and :math:`\boldsymbol{\rho}` a control vector:
:math:`\boldsymbol{\theta}(x)=\phi\left(\boldsymbol{\mathcal{D}}(x),\boldsymbol{\rho}\right)`.
See the :ref:`math_num_documentation.mapping` section for more details.

First, a shape is assumed for the mapping (here **multi-polynomial** or **neural network**).
Then the control vector of the mapping needs to be optimized: :math:`\boldsymbol{\hat{\rho}}=\underset{\mathrm{\boldsymbol{\rho}}}{\text{argmin}}\;J`,
with :math:`J` the cost function.

.. image:: ../../_static/lez.png
    :width: 400
    :align: center

Required data
-------------

.. hint::

    It is the same dataset as the :ref:`user_guide.quickstart.lez_split_sample_test` study, so possibly no need to re-download it.

You need first to download all the required data.

.. button-link:: https://smash.recover.inrae.fr/dataset/Lez-dataset.tar
    :color: primary
    :shadow:
    :align: center

    **Download**

If the download was successful, a file named ``Lez-dataset.tar`` should be available. We can switch to the directory where this file has been 
downloaded and extract it using the following command:

.. code-block:: shell

    tar xf Lez-dataset.tar

Now a folder called ``Lez-dataset`` should be accessible and contain the following files and folders:

- ``France_flwdir.tif``
    A GeoTiff file containing the flow direction data,
- ``gauge_attributes.csv``
    A csv file containing the gauge attributes (gauge coordinates, drained area and code),
- ``prcp``
    A directory containing precipitation data in GeoTiff format with the following directory structure: ``%Y/%m`` 
    (``2012/08``),
- ``pet``
    A directory containing daily interannual potential evapotranspiration data in GeoTiff format,
- ``qobs``
    A directory containing the observed discharge data in csv format,
- ``descriptor``
    A directory containing physiographic descriptors in GeoTiff format.

Six physical descriptors are considered in this example, which are:

.. image:: ../../_static/physio_descriptors.png
    :align: center

.. TODO: Add descriptor explanation

We can open a Python interface. The current working directory will be assumed to be the directory where
the ``Lez-dataset`` is located.

Open a Python interface:

.. code-block:: shell

    python3

.. ipython:: python
    :suppress:

    import os
    os.system("python3 generate_dataset.py -d Lez")

Imports
-------

We will first import everything we need in this tutorial.

.. ipython:: python

    import smash
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt

Model creation
--------------

Model setup creation
********************

.. ipython:: python

    setup = {
        "start_time": "2012-08-01",
        "end_time": "2013-07-31",
        "dt": 86_400, # daily time step
        "hydrological_module": "gr4", 
        "routing_module": "lr",
        "read_prcp": True, 
        "prcp_directory": "./Lez-dataset/prcp", 
        "read_pet": True,  
        "pet_directory": "./Lez-dataset/pet",
        "read_qobs": True,
        "qobs_directory": "./Lez-dataset/qobs",
        "read_descriptor": True,
        "descriptor_directory": "./Lez-dataset/descriptor",
        "descriptor_name": [
            "slope",
            "drainage_density",
            "karst",
            "woodland",
            "urban",
            "soil_water_storage"
        ]
    }

Model mesh creation
*******************

.. ipython:: python

    gauge_attributes = pd.read_csv("./Lez-dataset/gauge_attributes.csv")

    mesh = smash.factory.generate_mesh(
        flwdir_path="./Lez-dataset/France_flwdir.tif",
        x=list(gauge_attributes["x"]),
        y=list(gauge_attributes["y"]),
        area=list(gauge_attributes["area"] * 1e6), # Convert km² to m²
        code=list(gauge_attributes["code"]),
    )

Then, we can initialize the `smash.Model` object

.. ipython:: python

    model = smash.Model(setup, mesh)

Model simulation
----------------

Multiple polynomial
*******************

To optimize the rainfall-runoff model using a multiple polynomial mapping of descriptors to conceptual model parameters,
the value ``multi-polynomial`` simply needs to be passed to the mapping argument. We add another option to limit the number of iterations
by stopping the optimizer after ``50`` iterations.

.. To speed up documentation generation
.. ipython:: python
    :suppress:

    ncpu = min(5, max(1, os.cpu_count() - 1))
    model_mp = smash.optimize(
        model,
        mapping="multi-polynomial",
        optimize_options={
            "termination_crit": dict(maxiter=50),
        },
        common_options={"ncpu": ncpu},
    )

.. ipython:: python
    :verbatim:

    model_mp = smash.optimize(
        model,
        mapping="multi-polynomial",
        optimize_options={
            "termination_crit": dict(maxiter=50),
        },
    )

We have therefore optimized the set of rainfall-runoff parameters using a multiple polynomial regression constrained by
physiographic descriptors. Here, most of the options used are the default ones, i.e. a minimization of one minus the Nash-Sutcliffe
efficiency on the most downstream gauge of the domain. The resulting rainfall-runoff parameter maps can be viewed.

.. ipython:: python

    f, ax = plt.subplots(2, 2)

    map_cp = ax[0,0].imshow(model_mp.get_rr_parameters("cp"));
    f.colorbar(map_cp, ax=ax[0,0], label="cp (mm)");
    map_ct = ax[0,1].imshow(model_mp.get_rr_parameters("ct"));
    f.colorbar(map_ct, ax=ax[0,1], label="ct (mm)");
    map_kexc = ax[1,0].imshow(model_mp.get_rr_parameters("kexc"));
    f.colorbar(map_kexc, ax=ax[1,0], label="kexc (mm/d)");
    map_llr = ax[1,1].imshow(model_mp.get_rr_parameters("llr"));
    @savefig user_guide.classical_uses.lez_regionalization.mp_theta.png
    f.colorbar(map_llr, ax=ax[1,1], label="llr (min)");

As well as performances at upstream gauges

.. ipython:: python
    
    metrics = ["nse", "kge"]
    upstream_perf = pd.DataFrame(index=model.mesh.code[1:], columns=metrics)

    for m in metrics:
        upstream_perf[m] = np.round(smash.metrics(model_mp, metric=m)[1:], 2)

    upstream_perf

.. note::
    The two upstream gauges are the two last gauges of the list. This is why we use ``[1:]`` in the lists in order to take all the gauges
    except the first, which is the downstream gauge on which the model has been calibrated.

Artificial neural network
*************************

We can optimize the rainfall-runoff model using a neural network (NN) based mapping of descriptors to conceptual model parameters.
It is possible to define your own network to implement this optimization, but here we willl use the default neural network.
Similar to multiple polynomial mapping, all you have to do is to pass the value, ``ann`` to the ``mapping`` argument.
We also pass other options specific to the use of a NN:

- ``optimize_options``
    - ``random_state``: a random seed used to initialize neural network weights.
    - ``learning_rate``: the learning rate used for weights updates during training.
    - ``termination_crit``: the number of training ``epochs`` for the neural network and a positive number to stop training when the loss function does not decrease below the current optimal value for  ``early_stopping`` consecutive ``epochs``

- ``return_options``
    - ``net``: return the optimized neural network

.. To speed up documentation generation
.. ipython:: python
    :suppress:

    ncpu = min(5, max(1, os.cpu_count() - 1))
    model_ann, opt_ann = smash.optimize(
        model,
        mapping="ann",
        optimize_options={
            "random_state": 23,
            "learning_rate": 0.004,
            "termination_crit": dict(epochs=100, early_stopping=20),
        },
        return_options={"net": True},
        common_options={"ncpu": ncpu},
    )

.. ipython:: python
    :verbatim:

    model_ann, opt_ann = smash.optimize(
        model,
        mapping="ann",
        optimize_options={
            "random_state": 23,
            "learning_rate": 0.004,
            "termination_crit": dict(epochs=100, early_stopping=20),
        },
        return_options={"net": True},
    )
.. note::
    As we used the `smash.optimize` method (here an :ref:`ADAM algorithm <math_num_documentation.optimization_algorithm>` by default when choosing a NN based mapping) and asked for optional return values, this function will return two values, the optimized model
    ``model_ann`` and the optional returns ``opt_ann``.

Since we have returned the optimized neural network, we can visualize what it contains

.. ipython:: python

    opt_ann.net

The information displayed tells us that the default neural network is composed of 2 hidden dense layers followed by ``ReLU`` activation functions 
and a final layer followed by a ``Sigmoid`` function. To scale the network output to the boundary condition, a ``MinMaxScale`` function is applied. 
Other information is available in the `smash.factory.Net` object, including the value of the cost function at each iteration.

.. ipython:: python

    plt.plot(opt_ann.net.history["loss_train"]);
    plt.xlabel("Epoch");
    plt.ylabel("$1-NSE$");
    plt.grid(alpha=.7, ls="--");
    @savefig user_guide.classical_uses.lez_regionalization.ann_J.png
    plt.title("Cost function descent");

Finally, we can visualize parameters and performances

.. ipython:: python

    f, ax = plt.subplots(2, 2)

    map_cp = ax[0,0].imshow(model_ann.get_rr_parameters("cp"));
    f.colorbar(map_cp, ax=ax[0,0], label="cp (mm)");
    map_ct = ax[0,1].imshow(model_ann.get_rr_parameters("ct"));
    f.colorbar(map_ct, ax=ax[0,1], label="ct (mm)");
    map_kexc = ax[1,0].imshow(model_ann.get_rr_parameters("kexc"));
    f.colorbar(map_kexc, ax=ax[1,0], label="kexc (mm/d)");
    map_llr = ax[1,1].imshow(model_ann.get_rr_parameters("llr"));
    @savefig user_guide.classical_uses.lez_regionalization.ann_theta.png
    f.colorbar(map_llr, ax=ax[1,1], label="llr (min)");

.. ipython:: python
    
    metrics = ["nse", "kge"]
    upstream_perf = pd.DataFrame(index=model.mesh.code[1:], columns=metrics)

    for m in metrics:
        upstream_perf[m] = np.round(smash.metrics(model_ann, metric=m)[1:], 2)

    upstream_perf

.. ipython:: python
    :suppress:

    plt.close('all')