Jump to content

Calculate Quartiles and Interquartile Range to Detect Outliers


Recommended Posts

Background: 

Seeq has functions in Formula to remove outliers based on different algorithms, but sometimes it is desired to identify and remove outliers that falls outside of the interquartile range. 

Solution:

The approach we can take to solve this data cleansing problem in Seeq is to determine the periods over which we want to calculate the quartiles, calculate new signals from the 25th and 75th percentiles during each of those periods, identify deviations from those percentiles, and remove data outside of the IQR from our original signal. 

1. The first step is to decide what type of periods you would like to use to calculate your percentiles. Some periodic choices might include: hourly, daily, or a rolling window of 24 hours each hour. Other choices could be the current production run, the time since the equipment was last maintained, etc. In this example we will use an hourly periodic condition in our quartile calculations. 

image.png

2. Next, use the signal from condition tool to calculate the 25th percentile during each of the capsules defined above. 

image.png

3. Use the same method to calculate the 75th percentile during each of the capsules defined above. 

image.png

4. Use Seeq's Formula tool to calculate the IQR.

$UpperQ - $LowerQ

image.png

 

5. Now use Formula to calculate the upper and lower limits as for outlier removal as:

$upperQ + n*$IQR

(where n is a scalar multiplier, 1.5 in this example)

 

image.png

$lowerQ - n*$IQR

image.png

 

6. Search for deviations from the Upper and Lower limits using Deviation Search.

image.png

7. Then use Formula to remove data during the identified outlier capsules. 

$signal.remove($outliers)

image.png

 

 

  • Like 2
Link to comment
Share on other sites

  • 1 year later...

Hi Allison,

Above is usually the method that we use to remove outliers for datasets. Also, sometimes we use z-scores. I am new here and I actually registered just to ask this question. In Seeq, there is a function called removeOutliers() with several parameters. How does this function work exactly and how does it compare in performance in detecting outliers with the above method? I tried to search in the knowledge-base for any information about it but I couldn't find anything. When I tested it at some Example signals, I noticed that it sometimes removes parts of the signal that are in the middle (not upper or lower) which is very strange (see below attachment). I'd appreciate feedback from your side or anyone else in the community.

 

Regards,

Ahmed

 

image.thumb.png.5fea9e9919116c97edae1262b5ac7680.png

  • Like 1
Link to comment
Share on other sites

  • 10 months later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...