Jump to content

Recommended Posts

Posted (edited)

I am currently testing the idea to run a prediction for a large number of signals, generated from an Asset Tree.

Are there any performance traps to be careful of? Are there any events that can cause the prediction model to be refit (other than updating the training window or inputs)?

This is the function we are looking at using.

$target.regressionModelOLS($training,
        false,
        $signal)
    .predict($signal)


These signals will be used to generate treemaps, organizer topics and potentially Odata endpoints.

Edited by Ivan Berry
  • Administrators
Posted

Ivan,

As you indicated the main concern is the fitting/refitting of the model. Depending on the scale I imagine this would take quite a bit of time.  Any change to the inputs would force a refit.  Also, if the inputs aren’t completely certain within the training window the model will look for new data and try a refit while the model is open.

There are other events that could force a refit, like manual cache clearing, but that would be so infrequent I wouldn’t worry.

Regards,
Teddy

  • Like 1
Posted

Thanks Teddy.

As a follow up question, are there any potential ways to make the prediction tool fit faster? 

One idea I had is to resample the target variable to 10mins data as we are not that interested in high frequency predictions, wondering if I also need to bother resampling all the input predictor variables, or does the regressionModelOLS function automatically do that?

$target.resample(10mins).regressionModelOLS($condition.toGroup(capsule("2022-01-01T14:32:00.000Z", "2022-06-01T20:32:00.000Z"), CapsuleBoundary.Intersect), false, $a,$a^2,$a^3,$b,$b^2,$b^3,$c,$c^2,$c^3,$d,$d^2,$d^3,$a*$b,$a*$c,$a*$d,$b*$c,$b*$d,$c*$d).predict($a,$a^2,$a^3,$b,$b^2,$b^3,$c,$c^2,$c^3,$d,$d^2,$d^3,$a*$b,$a*$c,$a*$d,$b*$c,$b*$d,$c*$d)

 

  • Administrators
Posted

Ivan,

If you would like to resample, I would recommend doing it in a standalone formula prior to the regression formula.  The reason for this is that only formula outputs are cached. The intermediates are not cached, so it would not reduce the number of samples that the formula needs to look at since it is doing the resampling in the same formula. Resampling in the same formula would reduce the samples in the fitting but typically it is better to reduce samples pulled.

Resampling the predictor variables would have some benefit.  Seeq will apply the prediction output to every sample so it would reduce the number of total samples that the output will be applied to.

I would also recommend, only using the necessary order of predictors. Since you are writing it as a formula you can select which variables you need to have higher orders and which ones can be linear.

Regards,
Teddy

  • Like 1

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...