Jump to content

Prediction Error: Number of Target Samples is Not Sufficient


Recommended Posts

  • Seeq Team

When generating a prediction in Seeq, some users often come across the error The number of target samples is not sufficient to conduct a regression. As the error suggests, there isn't enough data to perform the prediction but it can be difficult to determine the particular cause, especially when there are numerous inputs into the prediction. The steps below can be used to troubleshoot why a prediction is receiving this error and help suggest steps to resolve it.

Step 1: Confirm data in your signals
Seeq's prediction tool requires there to be interpolatable data at the timestamps of the target signal during the training window at least N+1 times, where N is the number of inputs into the model. These inputs aren't necessarily the number of signals, but the number of terms in the model (so for a polynomial scale, N = 2 * number of signals). In the case of a linear model that has 3 inputs, there needs to be at least 4 times when all of the inputs have data at the target's timestamps. If one of the signals never has data then the above requirement is not possible and that signal should be removed.

If there is data in every signal, the next thing to look at if there is data within the training window. Seeq's Prediction tool by default accepts a time range as the training window but this can be further refined by limiting your training window to a condition. To check if the requirement is still met, the formula below can be used, where each $signal_i is a signal used in the model, $conditon is the training window condition and $target is the target of the model.

($signal_1 * $signal_2 ... * $signal_N).within($condition).resample($target).validValues()

You can then match your display range with the time range chosen in the prediction and check the count of the resulting formula. This count either be determined using a Simple Scorecard Metric or in the Details Pane to check if it meets the requirement

Step 2: Evaluate the Prediction

If Step 1 suggests that all of your signals has data and should meet the necessary number of samples, then there could be an issue with the Prediction that prevents the requirement from being met. Some examples of this are

  1. Logarithmic Scale and Inputs with Zero or Negative Data: Its mathematically impossible to take a logarithm of zero or negative number so signals with data like this can't have these samples considered. To eliminate data like this from the inputs, formulas like $signal.within($signal > 0) can be used in conjunction with the previously mentioned formula to get an accurate count.
  2. Divide By Zero: There are cases when the prediction model can evaluate and fail to include samples due to divide by zero errors. To check if this is the problem, try using Principle Component Regression (PCR) instead of the standard Ordinary Least Squares (OLS) that's used by default in the Prediction tool.
Edited by Kristopher Wiggins
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...