Search the Community
Showing results for tags 'interquartile range'.
Background: Seeq has functions in Formula to remove outliers based on different algorithms, but sometimes it is desired to identify and remove outliers that falls outside of the interquartile range. Solution: The approach we can take to solve this data cleansing problem in Seeq is to determine the periods over which we want to calculate the quartiles, calculate new signals from the 25th and 75th percentiles during each of those periods, identify deviations from those percentiles, and remove data outside of the IQR from our original signal. 1. The first step is to decide what type of periods you would like to use to calculate your percentiles. Some periodic choices might include: hourly, daily, or a rolling window of 24 hours each hour. Other choices could be the current production run, the time since the equipment was last maintained, etc. In this example we will use an hourly periodic condition in our quartile calculations. 2. Next, use the signal from condition tool to calculate the 25th percentile during each of the capsules defined above. 3. Use the same method to calculate the 75th percentile during each of the capsules defined above. 4. Use Seeq's Formula tool to calculate the IQR. $UpperQ - $LowerQ 5. Now use Formula to calculate the upper and lower limits as for outlier removal as: $upperQ + n*$IQR (where n is a scalar multiplier, 1.5 in this example) $lowerQ - n*$IQR 6. Search for deviations from the Upper and Lower limits using Deviation Search. 7. Then use Formula to remove data during the identified outlier capsules. $signal.remove($outliers)