Jump to content

Calculating r2 value on a validation set


Recommended Posts

  • Seeq Team

When you are evaluating the efficacy of a regression, there a few commons methods. You might simply take the difference between your predicted value and your actual value, then create capsules when this value deviates from some critical magnitude. I'll outline an alternative approach, by calculating the r-squared (r2) value over each capsule (in my case, days), but this can be applied to any condition like batches or a Manual Condition of training and validation. The general outline is:

1. Build a prediction in Seeq using the Prediction tool. You can specify your training window by a condition or simply start and end time. More details in our Knowledge Base article: https://support.seeq.com/space/KB/143163422/Prediction

2. Create a condition in which you want to compare R2 values. In this example, I'll simply use a Periodic Condition of days. 

3. Resample your predicted value based on your original value. Seeq's resample function allows an input of another signal, which is particularly critical if your model inputs have varying sample rates. This will eliminate any error that would of otherwise been introduced by oversampling of your prediction and interpolation issues.

image.png

4.  Calculate the R2 value over the condition from Step #2 using the following Formula. 

$ym = $signal.aggregate(average(), $days, startkey()).toStep()
$total = (($signal-$ym)^2).aggregate(sum(), $days, startkey())
$residual = (($signal-$prediction)^2).aggregate(sum(), $days, startkey())
$r2 = (1-($residual/$total)).toStep()
return $r2

image.png

 

You can continue your Analysis by building a Value Search for when your R2 deviates below a given threshold - or summarize your results in your Organizer Topic. Feel free to reach out with any questions or improvement ideas!

 

Happy Seeqing!

-Chris O, Seeq Analytics Engineer

  • Like 2
Link to comment
Share on other sites

  • 4 months later...
  • Administrators

Please note that R Squared is a discrete calculation it would be better to write the formula as follows:

$ym = $signal.toDiscrete().aggregate(average(), $days, startkey()).toStep()
$total = (($signal.toDiscrete()-$ym)^2).aggregate(sum(), $days, startkey())
$residual = (($signal.toDiscrete()-$prediction)^2).aggregate(sum(), $days, startkey())
$r2 = (1-($residual/$total)).toStep()
return $r2
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...