# Custom Models¶

## Defining a Model¶

The `ModelBase`

class in `albatross`

uses the Curiously Recurring Template Pattern (CRTP)
which makes defining them slightly different from the standard inheritence pattern in C++.

In general an albatross model requires defining `_fit_impl`

, `_predict_impl`

and a
struct `Fit<MyModel>`

which is in charge of storing any coefficients.

## Fit Method¶

To get model.fit(dataset) to work you need to add a `_fit_impl`

method to your class,
this fit implementation needs to take a vector of features and corresponding targets
(often measurements) and needs to return a `Fit<ModelType>`

object holding any
information required to make predictions.

```
class ModelType : public ModelBase<ModelType> {
Fit<ModelType> _fit_impl(const std::vector<FeatureType> &features,
const MarginalDistribution &targets) const;
}
```

The `FeatureType`

here can be which ever type your problem requires and
you can have multiple `_fit_impl`

methods for different types in the same model.
Templated `_fit_impl`

methods also work.

## Predict Method¶

To get model.predict(features) to work you need to add a `_predict_impl`

method to your class,
this predict implementation needs to take a vector of features and a `Fit<ModelType>`

and
needs to return either an `Eigen::VectorXd`

(mean only),
`MarginalDistribution`

(mean and variance) or `JointDistribution`

(mean and covariance).

```
class ModelType : public ModelBase<ModelType> {
JointDistribution _predict_impl(const std::vector<FeatureType> &features,
const Fit<ModelType> &fit,
PredictTypeIdentity<JointDistribution>) const;
}
```

In this case above we’ve implemented predict to return a `JointDistribution`

which holds
the mean prediction as well as a full covariance. A `JointDistribution`

can be converted
into a `MarginalDistribution`

by taking the diagonal of the covariance matrix and a `MarginalDistribution`

can be converted into a mean only prediction (`Eigen::VectorXd`

) by simply taking the mean of the distribution.
As a result, by implementing predict for a `JointDistribution`

you will be able to call
all of the following.

```
const auto prediction model.fit(dataset).predict(features);
JointDistribution joint_pred =prediction.joint();
MarginalDistribution marginal_pred = prediction.marginal();
Eigen::VectorXd mean_pred = prediction.mean();
```

If you define `_predict_impl`

with a `MarginalDistribution`

instead, then you’ll find
that you can call,

```
MarginalDistribution marginal_pred = prediction.marginal();
Eigen::VectorXd mean_pred = prediction.mean();
```

but calling `prediction.joint();`

would result in a compile time error. Similarly if you just define the mean only version
then asking for anything other than `prediction.mean()`

will result in a compile time error.

We saw above that you could implement the `JointDistribution`

version and have access to all the predict types,
but that is often inefficient. Instead you may want to impelement specialized version for each of
the predict types. This is what is done in for the Gaussian processes (see `gp.hpp`

). The
desire to have specialized predict types is what led to the mysterious `PredictTypeIdentity<>`

argument,
which is required to allow overridable `_predict_impl`

methods with different return types.

## Fit Type¶

The fit type needs to be a specialization of the `Fit<>`

struct. The idea is that by forcing the
output of `_fit_impl`

to be a custom type we can subsequently make model types constant, which
gives us peace of mind that there isn’t accidentally some state that get’s stored in a model which
would cause two calls to `fit`

to produce different results.

Once you’ve defined the `Fit<>`

you shouldn’t ever need to actually inspect that type, that
should be left to the internals of `albatross`

. Instead you are encouraged to use `auto`

,

```
const auto fit_model = model.fit(dataset);
```

or write everything as one liners.

```
const Eigen::VectorXd mean = model.fit(dataset).predict(features).mean();
```

Here’s an illustration of the actual types that would result from a typical model workflow:

```
const ModelType model = make_my_model();
const FitModel<ModelType, Fit<ModelType>> fit_model = model.fit(dataset);
const Prediction<ModelType, FeatureType, Fit<ModelType>> prediction = fit_model.predict(features);
const JointDistribution joint_prediction = prediction.joint();
```

Again, thanks to `auto`

type declarations you shouldn’t need to actually know these types
but it may be helpful to get a glimpse of what’s happening under the hood. This chain of
types is what allows `albatross`

to keep track of how exactly you’re using a model and
decide (at compile time) the most efficient methods to use.

## Example¶

Here’s an example of a model which always returns the mean of the training data.

```
struct Fit<MeanModel> {
double mean;
}
class MeanModel : public albatross::ModelBase<MeanModel> {
public:
using FitType = Fit<MeanModel>;
std::string get_name() const { return "mean"; }
template <typename FeatureType>
FitType _fit_impl(const std::vector<FeatureType> &features,
const MarginalDistribution &targets) const {
FitType model_fit = {targets.mean.mean()};
return model_fit;
}
template <typename FeatureType>
Eigen::VectorXd _predict_impl(const std::vector<FeatureType> &features,
const FitType &fit,
PredictTypeIdentity<Eigen::VectorXd>) const {
Eigen::VectorXd output(features.size());
output.fill(fit.mean);
return output;
}
}
```

While defining your own model isn’t as simple as standard inheritence
, the benefits are large. Once you’ve defined a model using the `ModelBase`

class you can immediately start using all the
tools built around it, things such as cross validation, outlier detection using RANSAC,
and tuning tuning.