# Weighted versus Unweighted Averages in Seeq

## Recommended Posts

• Seeq Team

Background:

One of the quirks of raw, ungridded time series data is that sampling frequencies may vary. Sometimes the sample frequency is different for two signals that you are comparing, and sometimes the sample frequency is different for a single signal at different points in time due to a process data historian's compression configuration. How you handle this variability in sampling rate can have a significant impact on calculations of summary statistics.

Various approaches to calculating summary statistics and their implications:

In this example we have four signals with different amounts of samples in the display range (as highlighted by the "count" statistic in the details pane. Our goal is to calculate a single "average" value for all of the signals during this window. Notice the different outcome values from each different approach.

Method 1:

One option that  we have for getting a single "average" statistic is to take the average value of each of the 4 signals over the time window, then take the average of that. This method weights each of the signals evenly in the calculation of the final average value, since the final average value is equal to

`(0.25)*avg1 + (0.25)*avg2 + (0.25)*avg3 + (0.25)*avg4`

Method 2:

A second option is to first create a continuous average signal, then aggregate that over the display window to calculate an average.

The average can be calculated using formula and the average function with the syntax shown below.

Note that the sample count on the output signal has a significantly larger number of samples than any of the original signals. This is because the average function calculates a sample any time any of the input signals has a sample. For the signals that do not have a sample at a particular key, the linearly interpolated value of the signal is used in the average calculation.

Then a scorecard aggregation of the average of the continuous average signal can be calculated in the Scorecard Metric tool to get the result below.

Method 3:

A third option is to take the average of all the data points in the display window, independent of which signal they belong to. This approach involves first combining all of the samples from the 4 signals into a single signal, then taking the average of that value over the display window.

The sample count of the combined signal will be equal to the sum of the sample counts of all other signals, as demonstrated below. In this approach, the resultant signal also has a sample any time any of the other signals contain a sample. Note that if signals have the same frequency, a tiny delay can be applied to the 2nd through the nth signal (1 ns to n-1 ns) to ensure all samples are kept.

A scorecard metric of the average of the combined signal can then be calculated.

The overall average value returned using this method is much higher than the two previous methods due to the relatively higher amounts of samples in signals 3 & 4, which have generally have higher values than signals 1 & 2.

Method 4:

A non-time-weighted average (similar to method 3) can also be calculated using the following formula:

`average(\$signal1.toDiscrete(),\$signal2.toDiscrete(),\$signal3.toDiscrete(),\$signal4.toDiscrete())`

Once again the final signal contains a number of samples equal to the sum of all of the sample counts of the input signals.

In conclusion..

Which method of averaging is best for your use case? The answer is probably "it depends" on the use case you are analyzing. Some examples of when different methods may be applied:

• An average over a specific time range - Method 1
• An instantaneous average at a point in time - Method 2
• An average of signals where each sample represents a unique event or independent measurement (e.g. lab or quality data) - Method 4

Regardless of your specific use case, having an understanding of how your data frequency, your historian's compression settings and your analytical approach can impact your results is an important starting point in any analysis!

• 2