/************************************************************************/
/*                                                                      */
/*    vspline - a set of generic tools for creation and evaluation      */
/*              of uniform b-splines                                    */
/*                                                                      */
/*            Copyright 2015 - 2017 by Kay F. Jahnke                    */
/*                                                                      */
/*    The git repository for this software is at                        */
/*                                                                      */
/*    https://bitbucket.org/kfj/vspline                                 */
/*                                                                      */
/*    Please direct questions, bug reports, and contributions to        */
/*                                                                      */
/*    kfjahnke+vspline@gmail.com                                        */
/*                                                                      */
/*    Permission is hereby granted, free of charge, to any person       */
/*    obtaining a copy of this software and associated documentation    */
/*    files (the "Software"), to deal in the Software without           */
/*    restriction, including without limitation the rights to use,      */
/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
/*    sell copies of the Software, and to permit persons to whom the    */
/*    Software is furnished to do so, subject to the following          */
/*    conditions:                                                       */
/*                                                                      */
/*    The above copyright notice and this permission notice shall be    */
/*    included in all copies or substantial portions of the             */
/*    Software.                                                         */
/*                                                                      */
/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
/*                                                                      */
/************************************************************************/

/*! \file prefilter.h

    \brief Code to create the coefficient array for a b-spline.
    
    Note: the bulk of the code was factored out to filter.h, while this text still
    outlines the complete filtering process.
    
    B-spline coefficients can be generated in two ways (that I know of): the first
    is by solving a set of equations which encode the constraints of the spline.
    A good example of how this is done can be found in libeinspline. I term it
    the 'linear algebra approach'. In this implementation, I have chosen what I
    call the 'DSP approach'. In a nutshell, the DSP approach looks at the b-spline's
    reconstruction as a convolution of the coefficients with a specific kernel. This
    kernel acts as a low-pass filter. To counteract the effect of this filter and
    obtain the input signal from the convolution of the coefficients, a high-pass
    filter with the inverse transfer function to the low-pass is used. This high-pass
    has infinite support, but can still be calculated precisely within the bounds of
    the arithmetic precision the CPU offers, due to the properties it has.
    
    I recommend [CIT2000] for a formal explanation. At the core of my prefiltering
    routines there is code from Philippe Thevenaz' accompanying code to this paper,
    with slight modifications translating it to C++ and making it generic.
    The greater part of this file deals with 'generifying' the process and to
    employing multithreading and the CPU's vector units to gain speed.
    
    This code makes heavy use of vigra, which provides handling of multidimensional
    arrays and efficient handling of aggregate types - to only mention two of it's
    many qualities. The vectorization is done with Vc, which allowed me to code
    the horizontal vectorization I use in a generic fashion.
    
    In another version of this code I used vigra's BSplineBase class to obtain prefilter
    poles. This required passing the spline degree/order as a template parameter. Doing it
    like this allows to make the Poles static members of the solver, but at the cost of
    type proliferation. Here I chose not to follow this path and pass the spline order as a
    parameter to the spline's constructor, thus reducing the number of solver specializations
    and allowing automated testing with loops over the degree. This variant may be slightly
    slower. The prefilter poles I use are precalculated externally with gsl/blas and polished
    in long double precision to provide the most precise data possible. this avoids using
    vigra's polynomial root code which failed for high degrees when I used it.

    In addition to the code following the 'implicit scheme' proposed by Thevenaz, I provide
    code to use an 'explicit scheme' to obtain the b-spline coefficients. The implicit scheme
    makes assumptions about the continuation of the signal outside of the window of data which
    is acceessible: that the data continue mirrored, reflected, etc. - and it proceeds to
    capture these assumptions in formulae deriving suitable initial causal/anticausal coefficients
    from them. Usually this is done with a certain 'horizon' which takes into account the limited
    arithmetic precision of the calculations and abbreviates the initial coefficient calculation
    to a certain chosen degree of precision. The same effect can be achieved by simply embedding
    the knot point data into a frame containing extrapolated knot point data. If the frame is
    chosen so wide that margin effects don't 'disturb' the core data, we end up with an equally
    (im)precise result with an explicit scheme. The width of the frame now takes the roll of the
    horizon used in the implicit scheme and has the same effect. While the explicit scheme needs
    more memory, it has several advantages:

    - there is no need to code specific routines for initial coefficient generation
    - nor any need to explicitly run such code
    - the iteration over the input becomes more straightforward
    - arbitrary unconventional extrapolation schemes can be used easily

    A disadvantage, apart from the higher memory consumption, is that one cannot give a
    'precise' solution, which the implicit scheme can do for the cases it can handle. But what
    is 'precise'? Certainly there is no precision beyond the arithmetic precision offered by
    the underlying system. So if the horizon is chosen wide enough, the resulting coefficients
    become 'just about' the same with all schemes. They are interchangeable.

    In an image-processing context, the extra memory needed would typically be a small
    single-digit percentage - not really a bother. In my trials, I found the runtime differences
    between the two approaches negligible and the simplification of the code so attractive that
    I was tempted to choose the explicit scheme over the implicit. Yet since the code for the
    implicit scheme is there already and some of it is even used in the explicit scheme I keep
    both methods in the code base for now.
    
    Note that using the explicit scheme also makes it possible to, if necessary, widen the
    shape of the complete coefficient array (including the 'frame') so that it becomes
    vector-friendly. Currently, this is not done.

    [CIT2000] Interpolation Revisited by Philippe Thévenaz, Member,IEEE, Thierry Blu, Member, IEEE, and Michael Unser, Fellow, IEEE in IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 7, JULY 2000,
*/

// TODO instead of erecting a horizon-wide frame around the core coefficients for the explicit
// extrapolation, one might only widen the buffer and extrapolate inside the buffer, writing back
// to a smaller array of core coefficients, optionally with a brace. The only drawback is in
// handling extrapolation schemes which pick values for extrapolation which aren't collinear
// to the buffered data, like SPHERICAL BCs, which is currently the only one exhibiting such
// behaviour. One option would be to abolish SPHERICAL BCs and force users to use MANUAL
// prefiltering strategy for spherical data, which would, in a way, be 'purer' anyway, since
// SPHERICAL BCs are not really what you'd expect in a general-purpose b-spline library, as
// they are quite specific to panoramic image processing.

#ifndef VSPLINE_PREFILTER_H
#define VSPLINE_PREFILTER_H

#include "common.h"
#include "filter.h"
#include "basis.h"

namespace vspline {

using namespace std ;
using namespace vigra ;

/// With large data sets, and with higher dimensionality, processing separately along each
/// axis consumes a lot of memory bandwidth. There are ways out of this dilemma by interleaving
/// the code. Disregarding the calculation of initial causal and anticausal coefficients, the code
/// to do this would perform the forward filtering step for all axes at the same time and then, later,
/// the backward filtering step for all axes at the same time. This is possible, since the order
/// of the filter steps is irrelevant, and the traversal of the data can be arranged so that
/// values needed for context of the filter are always present (the filters are recursive and only
/// 'look' one way). I have investigated these variants, but especially the need to calculate
/// initial causal/anticausal coefficients, and the additional complications arising from
/// vectorization, have kept me from choosing this path for the current body of code. With the
/// inclusion of the explicit scheme for prefiltering, dimension-interleaved prefiltering becomes
/// more feasible, and I anticipate revisiting it.
///
/// Here I am using a scheme where I make access to 1D subsets of the data very efficient
/// (by buffering lines/stripes of data) and rely on the fact that such simple, fast access plays
/// well with the compiler's optimizer and pipelining in the CPU. From the trials on my own system
/// I conclude that this approach does not perform significantly worse than interleaving schemes
/// and is much easier to formulate and understand. And with fast access to 1D subsets, higher order
/// splines become less of an issue; the extra arithemtic to prefilter for, say, quintic splines is
/// done very quickly, since no additional memory access is needed beyond a buffer's worth of data
/// already present in core memory.
///
/// 'solve' is just a thin wrapper around filter_nd in filter.h, injecting the actual number of poles
/// and the poles themselves.
///
/// Note how smoothing comes into play here: it's done simply by
/// prepending an additional pole to the filter cascade, taking a positive value between
/// 0 (no smoothing) and 1 (total blur) if 'smoothing' is not 0.0. While I'm not sure about
/// the precise mathematics (yet) this does what is intended very efficiently. Why smoothe?
/// If the signal is scaled down when remapping, we'd have aliasing of higher frequencies
/// into the output, producing artifacts. Pre-smoothing with an adequate factor removes the
/// higher frequencies (more or less), avoiding the problem.
///
/// Using this simple method, pre-smoothing is computationally cheap, but the method used
/// here isn't (?) equivalent to convolving with a gaussian, though the effect is quite similar.
/// I think the method is called exponential smoothing.

// TODO: establish the proper maths for this smoothing method

template < typename input_array_type ,  ///< type of array with knot point data
           typename output_array_type , ///< type of array for coefficients (may be the same)
           typename math_type >         ///< type for arithmetic operations in filter
void solve ( input_array_type & input ,
             output_array_type & output ,
             TinyVector<bc_code,input_array_type::actual_dimension> bcv ,
             int degree ,
             double tolerance ,
             double smoothing = 0.0 ,
             int njobs = default_njobs )
{
  if ( smoothing != 0.0 )
  {
    assert ( smoothing > 0.0 && smoothing < 1.0 ) ;
    int npoles = degree / 2 + 1 ;
    long double *pole = new long double [ npoles ] ;
    pole[0] = smoothing ;
    for ( int i = 1 ; i < npoles ; i++ )
      pole[i] = vspline_constants::precomputed_poles [ degree ] [ i - 1 ] ;
    
    filter_nd < input_array_type , output_array_type , math_type >
              ( input ,
                output ,
                bcv ,
                npoles ,
                pole ,
                tolerance ,
                njobs ) ;
                
    delete[] pole ;
  }
  else
    filter_nd < input_array_type , output_array_type , math_type >
              ( input ,
                output ,
                bcv ,
                degree / 2 ,
                vspline_constants::precomputed_poles [ degree ] ,
                tolerance ,
                njobs ) ;
}

} ; // namespace vspline

#endif // VSPLINE_PREFILTER_H
