Playing around with parameters
Optimizer
STLSQ
"Sequentially thresholded least squares"
They take the argument $\lambda$, which specifies a threshold "of sparsity". Should parameter $p_i$ be smaller than $\lambda$, it will be removed (and it's corresponding term)
Meaning that the bigger the threshold is, the less terms may be present in the final model. In other words, the bigger the threshold, the more terms will be removed.
For 1D-Neuron-Multiple
the value of $\lambda$ seemed to be best around $(10^{-1}, 10^{-3})$, smaller values of $\lambda$ included insignificant noise.
Default value is $\lambda = 10^{-1}$
Also for the basic example they set the value to $\lambda = 10^{-1}$, which seems to agree.
TODO: STLSQ with vector of thresholds
Metrics
$L_2$ Norm error
$L_2$ (or $l_2$) norm of the error gives the $l_2$ of an error in each dimension
Of course the smaller the better
TVDIFF
Most times around 300
iterations seems to be by far enough
Regularization $\alpha$
The regularization parameter $\alpha$ tells us how strongly should the derivative be regularized (think of it as smoothed)
The bigger the $\alpha$ the less it oscillates, though the less "features" of the true derivative it really exhibits
It is mostly visible when there is a big spike in the derivative. Then the tvdiff
is unable "to catch up" when strongly regularized doesn't handle the spike well (it simply doesn't feature nearly as big of a spike)
I'd recommend starting with a higher $\alpha$ and slowly increasing it, until we find the derivative smooth enough.
With small $\alpha$ always check if it more or less corresponds to data (it has tendency to oscillate when the function is too constant)
Epsilon $\varepsilon$
Using tvdiff with ε=1e-9
, we obtain a strongly regularized result. Larger values of ε
improve conditioning and speed, while smaller values give more accurate results with sharper jumps.
Scale and preconditioner
TO BE DONE
Performance
In general the more data (and thus the derivation) varies in scale, the worse to model performs
Collocations
Data collocation is only used when derivative is NOT supplied (and is surely better than forward diff)
TODO: Usage collocations on existing derivative?