Jump to content

Data Cleansing Tips: Using the .remove() and .within() Formula Functions

John Cox

Recommended Posts

  • Seeq Team

A typical data cleansing workflow is to exclude equipment downtime data from calculations. This is easily done using the .remove() and .within() functions in Seeq formula. These functions remove or retain data when capsules are present in the condition that the user supplies as a parameter to the function. 

There is a distinct difference in the behavior of the .remove() and .within() functions that users should know about, so that they can use the best approach for their use case. 

  • .remove() removes the data during the capsules in the input parameter condition. For step or linearly interpolated signals, interpolation will occur across those data gaps that are of shorter duration than the signal's maximum interpolation. (See Interpolation for more details concerning maximum interpolation.)
  • .within() produces data gaps between the input parameter capsules. No interpolation will occur across data gaps (no matter what the maximum interpolation value is). 

Let's show this behavior with an example (see the first screenshot below, Data Cleansed Signal Trends), where an Equipment Down condition is identified with a simple Value Search for when Equipment Feedrate is < 500 lb/min. We then generate cleansed Feedrate signals which will only have data when the equipment is running. We do this 2 ways to show the different behaviors of the .remove() and .within() functions.  

  1. $Feedrate.remove($EquipmentDown) interpolates across the downtime gaps because the gap durations are all less than the 40 hour max interpolation setting. 
  2. $Feedrate.within($EquipmentDown.inverse()) does NOT interpolate across the downtime gaps. In the majority of cases, this result is more in line with what the user expects.  

As shown below, there is a noticeable visual difference in the trend results. Gaps are present in the green signal produced using the .within() function, wherever there is an Equipment Down capsule. A more significant difference is that depending on the nature of the data, the statistical calculation results for time weighted values like averages and standard deviations, can be very different. This is shown in the simple table (Signal Averages over the 4 Hour Time Period, second screenshot below). The effect of time weighting the very low, interpolated values across the Equipment Down capsules when averaging the Feedrate.remove($EquipmentDown) signal, gives a much lower average value compared to that for $Feedrate.within($EquipmentDown.inverse()) (1445 versus 1907). 

Data Cleansed Signal Trends 



Signal Averages over the 4 Hour Time Period


Content Verified DEC2023

Edited by Mark Pietryka
  • Like 4
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Create New...