A Guide for Implementing Large Scale Control Loop Performance Monitoring (CLPM)

John Cox · June 29, 2023

Proportional-integral-derivative (PID) control loops are essential to the automation and control of manufacturing processes and are foundational to the financial gains realized through advanced process control (APC) applications. Because poorly performing PID controllers can negatively impact production capacity, product quality, and energy consumption, implementing controller performance monitoring analytics leads to new data-driven insights and process optimization opportunities.

The following sections provide the essential steps for creating basic controller performance monitoring at scale in Seeq. More advanced CLPM solutions can be implemented by expanding the standard framework outlined below with additional metrics, customization features, and visualizations. Data Lab’s Spy library functionality is integral to creating large scale CLPM, but small scale CLPM is possible with no coding, using Asset Groups in Workbench. Key steps in creating a CLPM solution include:

Controller asset tree creation
Developing performance metric calculations and inserting them in the tree as formulas
Calculating advanced metrics via scheduled Data Lab notebooks (if needed)
Configuring asset scaled and individual controller visualizations in Seeq Workbench
Setting up site performance as well as individual controller monitoring in Seeq Organizer

Using these steps and starting from only the controller signals within Seeq, large scale CLPM monitoring can be developed relatively quickly, and a variety of visualizations can be made available to the manufacturing team for monitoring and improving performance. As a quick example of many end result possibilities, this loop health treemap color codes controller performance (green=good, yellow=questionable, red=poor):

The key steps in CLPM implementation, summarized above, are detailed below. Note: for use as templates for development of your own CLPM solution, the associated Data Lab notebooks containing the example code (for Steps 1-3) are included as file attachments to this article. The code and formulas described in Steps 1-3 can be adjusted and expanded to customize your CLPM solution as desired.

Example CLPM Solution: Detailed Steps

STEP 1: Controller asset tree creation

An asset tree is the key ingredient which enables scaling of the performance calculations across a large number of controllers. The desired structure is chosen and the pertinent controller tags (typically at a minimum SP, PV, OP and MODE) are mapped into the tree. For this example, we will create a structure with two manufacturing sites and a small number of controllers at each site. In most industrial applications, the number of controllers would be much higher, and additional asset levels (departments, units, equipment, etc.) could of course be included.

We use SPy.trees functionality within the Data Lab notebook to create the basic structure:

Controller tags for SP, PV, OP, and MODE are identified using SPy.search. Cleansed controller names are extracted and inserted as asset names within the tree:

Next, the controller tags (SP, PV, OP, and MODE), identified in the previous Data Lab code section using SPy.search, are mapped into the tree. At the same time, the second key step in creating the CLPM solution, developing basic performance metrics and calculations using Seeq Formula and inserting them into the asset tree, is completed. Note that in our example formulas, manual mode is detected when the numerical mode signal equals 0; this formula logic will need to be adjusted based on your mode signal conventions. While this second key step could be done just as easily as a separate code section later, it also works nicely to combine it with the mapping of the controller tag signals:

STEP 2: Developing performance metric calculations and inserting them in the tree as formulas

Several key points need to be mentioned related to this step in the CLPM process, which was implemented using the Data Lab code section above (see previous screenshot).

There are of course many possible performance metrics of varying complexity. A good approach is to start with basic metrics that are easily understood, and to incrementally layer on more complex ones if needed, as the CLPM solution is used and shortcomings are identified. The selection of metrics, parameters, and the extent of customization for individual controllers should be determined by those who understand the process operation, control strategy, process dynamics, and overall CLPM objectives. The asset structure and functionality provided with the Data Lab asset tree creation enables the user to implement the various calculation details that will work best for their objectives.

Above, we implemented Hourly Average Absolute Controller Error (as a percentage based on the PV value) and Hourly Percent Span Traveled as basic metrics for quantifying performance. When performance is poor (high variation), the average absolute controller error and percent span traveled will be abnormally high. Large percent span traveled values also lead to increased control valve maintenance costs. We chose to calculate these metrics on an hourly basis, but calculating more or less frequently is easily achieved, by substituting different recurring time period functions in place of the “hours()” function in the formulas for Hourly Average Absolute Controller Error and Hourly Percent Span Traveled.

The performance metric calculations are inserted as formula items in the asset tree. This is an important aspect as it allows calculation parameters to be customized as needed on an individual controller basis, using Data Lab code, to give more accurate performance metrics. There are many customization possibilities, for example controller specific PV ranges could be used to normalize the controller error, or loosely tuned level controllers intended to minimize OP movement could be assigned a value of 0 error when the PV is within a specified range of SP.

The Hourly Average Absolute Controller Error and Hourly Percent Span Traveled are then aggregated into an Hourly Loop Health Score using a simple weighting calculation to give a single numerical value (0-100) for categorizing overall performance. Higher values represent better performance. Another possible approach is to calculate loop health relative to historical variability indices for time periods of good performance specified by the user.

The magnitude of a loop health score comprised of multiple, generalized metrics is never going to generate a perfect measure of performance. As an alternative to using just the score value to flag issues, the loop health score can be monitored for significant decreasing trends to detect performance degradation and report controller issues.

While not part of the loop health score, note in the screenshot above that we create an Hourly Percent Time Manual Mode signal and an associated Excessive Time in Manual Mode condition, as another way to flag performance issues – where operators routinely intervene and adjust the controller OP manually to keep the process operating in desired ranges. Manual mode treemap visualizations can then be easily created for all site controllers.

With the asset tree signal mappings and performance metric calculations inserted, the tree is pushed to a Workbench Analysis and the push results are checked:

STEP 3: Calculating advanced metrics via scheduled Data Lab notebooks (if needed)

Basic performance metrics (using Seeq Formula) may be all that are needed to generate actionable CLPM, and there are advantages to keeping the calculations simple. If more advanced performance metrics are needed, scheduled Data Lab notebooks are a good approach to do the required math calculations, push the results as signals into Seeq, and then map/insert the advanced metrics as items in the existing asset tree. There are many possibilities for advanced metrics (oscillation index, valve stiction, non-linearity measures, etc.), but here as an example, we calculate an oscillation index and associated oscillation period using the asset tree Controller Error signal as input data. The oscillation index is calculated based on the zero crossings of the autocorrelation function. Note: the code below does not account for time periods when the controller is in manual, the process isn’t running, problems with calculating the oscillation index across potential data gaps, etc. – these issues would need to be considered for this and any advanced metric calculation.

Initially, the code above would be executed to fill in historical data oscillation metric results for as far back in time as the user chooses, by adjusting the calculation range parameters. Going forward, this code would be run in a recurring fashion as a scheduled notebook, to calculate oscillation metrics as time moves forward and new data becomes available. The final dataframe result from the code above looks as follows:

After examining the results above for validity, we push the results as new signals into Seeq Workbench with tag names corresponding to the column names above. Note the new, pushed signals aren’t yet part of the asset tree:

There are two additional sections of code that need to be executed after oscillation tag results have been pushed for the first time, and when new controller tags are added into the tree. These code sections update the oscillation tag metadata, adding units of measure and descriptions, and most importantly, map the newly created oscillation tags into the existing CLPM asset tree:

STEP 4: Configuring asset scaled and individual controller visualizations in Seeq Workbench

The CLPM asset tree is now in place in the Workbench Analysis, with drilldown functionality from the “US Sites” level to the signals, formula-based performance metrics, and Data Lab calculated advanced metrics, all common to each controller:

The user can now use the tree to efficiently create monitoring and reporting visualizations in Seeq Workbench. Perhaps they start by setting up raw signal and performance metric trends for an individual controller. Here, performance degradation due to the onset of an oscillation in a level controller is clearly seen by a decrease in the loop health score and an oscillation index rising well above 1:

There are of course many insights to be gained by asset scaling loop health scores and excessive time in manual mode across all controllers at a site. Next, the user creates a sorted, simple table for average loop health, showing that 7LC155 is the worst performing controller at the Beech Island site over the time range:

The user then flags excessive time in manual mode for controller 2FC117 by creating a Lake Charles treemap based on the Excessive Time in Manual Mode condition:

A variety of other visualizations can also be created, including controller data for recent runs versus current in chain view, oscillation index and oscillation period tables, a table of derived control loop statistics (see screenshot below for Lake Charles controller 2FC117) that can be manually created within Workbench or added later as items within the asset tree, and many others.

Inspecting a trend in Workbench (see screenshot below) for a controller with significant time in manual mode, we of course see Excessive Time in Manual Mode capsules, created whenever the Hourly Percent Time Manual Mode was > 25% for at least 4 hours in a row. More importantly, we can see the effectiveness of including hours().intersect($Mode!=0) in the formulas for Hourly Average Absolute Controller Error and Hourly Percent Span Traveled. When the controller is in manual mode, that data is excluded from the metric calculations, resulting in the gaps shown in the trend. Controller error and OP travel have little meaning when the controller is in manual, so excluding data is necessary to keep the metrics accurate. It would also be very easy to modify the formulas to only calculate metrics for hourly time periods where the controller was in auto or cascade for the entire hour (using Composite Condition and finding hourly capsules that are “outside” manual mode capsules). The ability to accurately contextualize the metric calculations, to the time periods where they can be validly calculated, is a key feature in Seeq for doing effective CLPM implementations. Please also refer to the “Reminders and Other Considerations” section below for more advanced ideas on how to identify time periods for metric calculations.

STEP 5: Setting up site performance as well as individual controller monitoring in Seeq Organizer

As the final key step, the visualizations created in Workbench are inserted into Seeq Organizer to create a cohesive, auto-updating CLPM report with site overviews as well as individual controller visuals. With auto-updating date ranges applied, the CLPM reports can be “review ready” for recurring meetings. Asset selection functionality enables investigative workflows: poorly performing controllers are easily identified using the “Site CLPM” worksheet, and then the operating data and metrics for those specific controllers can be quickly investigated via asset selection in the site’s “Individual Controllers” worksheet – further details are described below.

An example “Site CLPM” Organizer worksheet (see screenshot below) begins with a loop health performance ranking for each site, highlighting the worst performing controllers at the top of the table and therefore enabling the manufacturing team to focus attention where needed; if monitoring hundreds of controllers, the team could filter the table to the top 10 or 20 worst performing controllers. The visualizations also include a treemap for controllers that are often switched to manual mode – the team can talk to operators on each crew to determine why the controllers are not trusted in auto or cascade mode, and then generate action items to resolve. Finally, oscillating controllers are flagged in red in the sorted Oscillation Metrics tables, with the oscillation period values also sorted – short oscillation periods may prematurely wear out equipment and valves due to high frequency cycling; long oscillation periods are more likely to negatively impact product quality, production rate, and energy consumption. Controllers often oscillate due to root causes such as tuning and valve stiction and this variability can often be eliminated once an oscillating controller has been identified. The oscillation period table can also be perused for controllers with similar periods, which may be evidence of an oscillation common to multiple controllers which is generating widespread process variation.

An example “Individual Controllers” Organizer worksheet is shown below, where detailed operating trends and performance metrics can be viewed for changes, patterns, etc., and chain view can be used to compare controller behavior for the current production run versus recent production runs. Other controllers can be quickly investigated using the asset selection dropdown, and the heading labels (e.g., Beech Island >> 7LC155) change dynamically depending on which controller asset is selected. For example, the Beech Island 7LC155 controller which was identified as the worst performing controller in the “Site CLPM” view above, can be quickly investigated in the view below, where it is evident that the controller is oscillating regularly and the problem has been ongoing, as shown by the comparison of the current production run to the previous two runs:

Reminders and Other Considerations

As evident with the key steps outlined above, a basic CLPM solution can be rapidly implemented in Seeq. While Seeq’s asset and contextualization features are ideal for efficiently creating CLPM, there are many details which go into developing and maintaining actionable CLPM dashboards for your process operation. A list of reminders and considerations is given below.

1. For accurate and actionable results, it is vital to only calculate performance metrics when it makes sense to do so, which typically means when the process is running at or near normal production rates. For example, a controller in manual during startup may be expected and part of the normal procedure. And of course, calculating average absolute controller error when the process is in an unusual state will likely lead to false indications of poor performance. Seeq is designed to enable finding those very specific time periods when the calculations should be performed. In the CLPM approach outlined in this article, we used time periods when the controller was not in manual by including hours().intersect($Mode!=0) in the formulas for Hourly Average Absolute Controller Error and Hourly Percent Span Traveled. When the controller is in manual mode, that data is excluded from the metric calculations. But of course, a controller might be in auto or cascade mode when the process is down, and there could be many other scenarios where only testing for manual mode isn’t enough. In the CLPM approach outlined above, we intentionally kept things simple by just calculating metrics when the controller was not in manual mode, but for real CLPM implementations you will need to use a more advanced method. Here are a few examples of finding “process running” related time periods using simple as well as more advanced ways. Similar approaches can be used with your process signals, in combination with value searches on controller mode, for excluding data from CLPM calculations:

A simple Process Running condition can be created with a Value Search for when Process Feed Flow is > 1 for at least 2 hours.
A 12 Hours after Process Running condition can be created with a formula based on the Process Running Condition: $ProcessRunning.afterStart(12hr)
A Process Running (> 12 hours after first running) condition can then be created from the first two conditions with the formula: $ProcessRunning.subtract($TwelveHrsAfterRunning)
Identifying time periods based on the value and variation of the production rate is also a possibility as shown in this Formula:

The conditions described above are shown in the trends below:

2. As previously mentioned, including metric formulas and calculations as items within the asset tree enables customization for individual controllers as needed, when controllers need unique weightings, or when unique values such as PV ranges are part of the calculations. It may also be beneficial to create a “DoCalcsFlag” signal (0 or 1 value) as an item under each controller asset and use that as the criteria to exclude data from metric calculations. This would allow customization of “process is running normally and controller is not in manual” on an individual controller basis, with the result common for each controller represented as the “DoCalcsFlag” value.

3. In the CLPM approach outlined above, we used SPy.trees in Data Lab to create the asset tree. This is the most efficient method for creating large scale CLPM. You can also efficiently create the basic tree (containing the raw signal mappings) from a CSV file. For small trees (<= 20 controllers), it is feasible to interactively create the CLPM asset tree (including basic metric calculation formulas) directly in Workbench using Asset Groups. The Asset Groups approach requires no Python coding and can be quite useful for a CLPM proof of concept, perhaps focused on a single unit at the manufacturing site. For more details on Asset Groups: https://www.seeq.org/index.php?/forums/topic/1333-asset-groups-101-part-1/).

4. In our basic CLPM approach, we scoped the CLPM asset tree and calculations to a single Workbench Analysis. This is often the best way to start for testing, creating a proof of concept, getting feedback from users, etc. You can always decide later to make the CLPM tree and calculations global, using the SPy.push option for workbook=None.

5. For long-term maintenance of the CLPM tree, you may want to consider developing an add-on for adding new controllers into the tree, or for removing deleted controllers from the tree. The add-on interface could also prompt the user for any needed customization parameters (e.g., PV ranges, health score weightings) and could use SPy.trees insert and remove functionality for modifying the tree elements.

6. When evidence of more widespread variation is found (more than just variability in a single controller), and the root cause is not easily identified, CLPM findings can be used to generate a list of controllers (and perhaps measured process variables in close proximity) that are then fed as a dataset to Seeq’s Process Health Solution for advanced diagnostics.

7. For complex performance or diagnostic metrics (for example, stiction detection using PV and OP patterns), high quality data may be needed to generate accurate results. Therefore, some metric calculations may not be feasible depending on the sample frequency and compression levels inherent with archived and compressed historian data. The only viable options might be using raw data read directly from the distributed control system (DCS), or specifying high frequency scan rates and turning off compression for controller tags such as SP, PV, and OP in the historian. Another issue to be aware of is that some metric calculations will require evenly spaced data and therefore need interpolation (resampling) of historian data. Resampling should be carefully considered as it can be problematic in terms of result accuracy, depending on the nature of the calculation and the signal variability.

8. The purpose of this article was to show how to set up basic CLPM in Seeq but note that many types of process calculations to monitor “asset health” metrics could be created using a similar framework.

Summary

While there are of course many details and customizations to consider for generating actionable controller performance metrics for your manufacturing site, the basic CLPM approach above illustrates a general framework for getting started with controller performance monitoring in Seeq. The outlined approach is also widely applicable for health monitoring of other types of assets. Asset groups/trees are key to scaling performance calculations across all controllers, and Seeq Data Lab can be used as needed for implementing more complex metrics such as oscillation index, stiction detection, and others. Finally, Seeq Workbench tools and add-on applications like Seeq’s Process Health solution can be used for diagnosing the root cause of performance issues automatically identified via CLPM monitoring.

CLPM Asset Tree Creation.ipynb

Advanced CLPM Metric Calculations.ipynb

Edited June 30, 2023 by John Cox
attaching Data Lab notebook files

Sign In

A Guide for Implementing Large Scale Control Loop Performance Monitoring (CLPM)

Recommended Posts

John Cox

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in