Accurate data alignment can be the most challenging part of creating process calculations, finding correlations, and developing prediction models. It is an often overlooked data cleansing step that may be even more critical than smoothing noisy signals and removing data outliers. The need for data alignment stems from time delays present in the industrial process (related to physical transport and equipment/piping volumes, as well as lab measured data reported well after process operation completes). Another common data alignment need centers on comparing process metrics before/after process events of interest, which again involves time delays (or time shifts).
There are at least four categories of data alignment use cases prevalent across the process manufacturing industries:
Known time delay - The time delay resulting from the transportation of material at a given speed or velocity (across some distance in the process operation) can mask strong correlations between upstream/downstream signals or other signals separated in time, resulting in poor modeling results if not accounted for by a data alignment step.
Variable time delay – Here the time delay is variable and a function of production speed, storage volumes, etc. but can be calculated based on measured/known process parameters.
Before/after comparisons - Related to process experimentation and optimization, there is a recurring need to calculate and compare process metrics before and after some identified process event, such as a process feed being turned on, a unit restart, equipment maintenance, a process parameter or setpoint being adjusted, etc.
Process and analytical (LIMS) data - When trying to correlate process signals with lab-measured analytical results, work is often needed to align the process signals and subsequent analytical results. In some cases, the alignment can be based on sequential, consistently reported lab data values. In other cases, more sophisticated logic, such as connecting process operation and lab results by matching id properties, is needed.
Example methods for addressing each of these cases, using Seeq tools and Formula, are included below.
Use Case Category #1: Known Time Delay
In this specific example, the goal is to correlate a process signal (temperature) with a later reported lab result, but this use case also occurs frequently with continuous upstream (earlier in the process) and downstream signals (later in process), and the same time shifting methodology applies.
It is known that the lab result is consistently reported 2 hours after the process operation which would most correlate. Therefore, the “Process Temperature” signal needs to be shifted 2 hours to the right in time and then sampled for each unique “Lab Measured Analytical Result” value.
1. We shift the raw "Process Temperature" 2 hours to the right using Seeq Formula's move() function to create "Temperature Shifted 2 Hours to Right":
2. We use Seeq Formula's resample() function to pick off values of the shifted temperature at the exact timestamp that new "Lab Measured Analytical Result" values are reported. The resample() function gives the ability to sample one signal based on the timestamps of the data values from another signal. The result is 1 temperature sample (green, Lane 1) per lab sample (Lane 2).
3. The value of the data alignment is obvious by comparing two XY plots of the lab result and the original, raw temperature/aligned temperature. The correlation between the process temperature and the lab result, hidden in the XY plot on the left, is obvious in the XY plot on the right, which shows the aligned temperature:
Use Case Category #2: Variable Time Delay
This use case is similar to the previous: correlating two measurements separated by a time delay, but with an additional complication. Here, the time delay is variable and is based on the physics of a material transport distance: changes in an upstream pressure physically take many hours to work their way through equipment/piping and influence a downstream analyzer signal. The physical time delay varies based on Production Line Speed.
1. The transport distance is known (500 feet) and set up as a value using Seeq Formula. The user calculates the Time Delay between the 2 signals based on the Transport Distance / Production Line Speed. The resulting time delay fluctuates between 7 and 13 hours.
Transport Distance Formula:
Calculated Time Delay Formula:
2. Using Seeq Formula's move() function, the Upstream Pressure can then be shifted (by the variable time delay calculated in Step 1) so that the effect of pressures changes upstream aligns in time with the resulting impact on the Downstream Analyzer.
// Limit the maximum time delay to 20 hours
Changes in the upstream pressure (yellow signal) are now correctly aligned in time with their effect on the downstream analyzer (blue signal):
Note that in this use case, we did not need to use the resample() function as we did with the discretely measured lab data in Use Case Category #1, as here we are working with continuously measured signals.
Use Case Category #3: Before/After Comparisons
An extremely common use case is to compare process metrics before/after some process event. The process event could be anything of interest: a process unit restart, equipment or control strategy modifications, periodic maintenance work, process experimentation, etc. The analysis begins by identifying the process events and then calculating the metrics over appropriate time ranges before and after each event. The alignment step typically involves creating a common time basis spanning before/after operation and moving the before/after calculated metrics to the same point in time for comparison.
1. For this example we begin with a Pressure signal that ramps up over time as equipment run life increases and the equipment fouls. The equipment is shut down for Maintenance periodically and the process then restarts at a much lower pressure value. We use the Pressure signal and Seeq Formula to identify Equipment Maintenance periods based on the running delta of the pressure signal < 0:
2. Use Formula's grow() function to expand Equipment Maintenance to create a Before and After Maintenance time period which includes the desired time period (for example, 4 hrs) for calculating before/after pressures. This also gives a common time basis for a later alignment step.
Results are shown here in Chain View for the Before and After Maintenance capsules, which gives a nice visual of the pressure 4 hours before and after maintenance.
3. Average the Pressure signal over the first 2 hours of the Before and After Maintenance capsules.
// Do a 2 hour avg pressure over the FIRST 2 hours
// of the Before and After Maintenance capsules
4. Use Signal from Condition to align the average pressure before maintenance (Step 3) at the middle of the Before and After Maintenance capsules. Signal from Condition is commonly used to find a value within a condition and move it to a specific location in time. In this case, we can use the Maximum statistic to find the correct "Pressure Avg (Before)" value, as there is only 1 value for each maintenance capsule. Using Signal from Condition in this way is a key technique for aligning before/after process values/metrics.
We can see that the "Pressure Avg (Before)" has now been moved to the middle of the "Before and After Maintenance" capsules:
5. Repeat steps 3 and 4 to compute/align the average pressure after maintenance, and then move/align to the middle of the Before and After Maintenance capsules.
6. With the before/after pressure values aligned in time, we now use Formula to calculate the percent pressure reduction resulting from maintenance.
// Calculate the % change relative to the
// the Pressure Avg (Before)
Use Case Category #4: Process and Analytical (LIMS) Data
In this example, we will align calculated process metric values from each batch (e.g., max temperature, total Chemical A flow added) with later reported lab results (often referred to as LIMS - Laboratory Information Management System data).This is a very common analytics need.
Here, the “Product Impurity” lab results are reported at varying time intervals following batch completion, so a constant time delay alignment approach isn’t feasible.
Zooming in, we can confirm that the lab results are reported (step to a new value) some time after the Process Batches complete. The reporting time is variable.
1. The maximum temperature and totalized chemical added over each Process Batch are thought to correlate with (influence) the amount of final Product Impurity. So, we need to calculate the max T and total chemical A added per Process Batches capsule. Use Signal for Condition to do the calculations and place results at the end of each Process Batch.
2. We now need to work on alignment. To provide a basis for joining the process results to the corresponding lab results, we need to create "Product Impurity Results Capsules" for every value change in the reported Product Impurity. We use the Formula toCondition() function for this and store the numeric lab result in a capsule property named PctImpurity.
Note: in the screenshot below, the highlighted Product Impurity Results capsule contains the lab result for the Process Batches capsule at the top left of the screen.
3. Now, use Composite Condition to "join" the Process Batches condition to the Product Impurity Results Capsules condition, so we have a time period that contains the process and lab results we want to align and correlate. We check the Inclusive options to create capsules from the start of Process Batches to the end of Product Impurity Results Capsules:
The resulting yellow "joined" capsules in the screenshot below now span the time period of process operation and the eventual lab measured impurity result, for each individual batch.
Investigating a zoomed time period, we can see the Process Batches start joined to the end of the resulting Product Impurity Results Capsules.
4. With a common capsule established, we align Max T, Total Chemical A, and Product Impurity at the middle of the "Process Batches to Impurity Result (joined)" capsules. For Max T and Total Chemical A, we use the 'Value at Start', and for Product Impurity, we use the 'Value at End'. Using Signal from Condition in this way is a key technique used in aligning process and lab data values.
The results for a short time period look like this, and the "aligned" values can be used for further calculations or for creating a prediction model to predict Product Impurity based on Max T and Total Chemical A:
Joining Process and Lab Values Based on a Matching Capsule Property
With that example finished, let’s look at another common (and more complicated) scenario in this use case category: when analytical results can be reported inconsistently or out of sequence with process batches, a more advanced condition join using Seeq Formula, based on a matching id or other capsule property value, may be needed.
In this example, a numeric batch id is a capsule property shared by the Process Batches and Lab Analytical Batches conditions: see the 437 and 438 capsule property values shown as labels on the blue and green capsules in the screenshot below. Using this batch id linkage, matching id Process/Lab capsules can be joined with a single line formula, and capsule properties from the separate process and lab conditions (e.g., see the 0.99 and 1.32 lab product impurity results shown on the blue capsules) can be preserved, all courtesy of enhanced “capsule matching by property” functionality introduced in Seeq R56 (link).
Starting with the raw data and Process Batches and Lab Analytical Batches conditions and their capsule properties, we have already calculated the "Average Process Ratio for Batch" using Signal from Condition and located it at the start of the Process Batches capsules. We illustrate only the critical steps in this use case under Analysis Steps below:
Analysis Steps - Joining Process and Lab Data on a Matching Capsule Property
1. We join the Process Batches (capsules) to Lab Analytical Batches, based on their BatchID/LabID property matching. We use the join() Formula function. The batch id numeric is stored as the "LabID" capsule property in the Lab Batches condition, so, prior to doing the join, we must rename it to have the same property name as the "BatchID" capsule property on the Process Batches condition. Note: the capsule property matching and keepProperties() are enhanced functionality options introduced in Seeq R56.
// Join process to lab batches based on matching ID.
// We need to rename the LabID capsule property to BatchID for the
// lab capsules.
// Use keepProperties() so that the resulting condition has
// the Result capsule property (the lab measured product impurity value).
Inspecting the capsules and the batch id and product impurity capsule property values at the top of the trend, we see the "Process Batches Joined" capsules are linked based on matching ID, and the product impurity "Result" capsule property (the 0.99 and 1.32 values) is retained and now part of a capsule that starts at the beginning of each Process Batches capsule:
2. We now translate the "Process Batches Joined to Lab Batches on ID Match" Result capsule property into a "Lab Measured Product Impurity Aligned to Batch Start" signal, with values moved to the start of the Process Batches, at the exact location we have the Avg Process Ratio for Batch.
For this short time range, we can confirm the 0.99 value for the resulting impurity signal aligns correctly to BatchID 437, and the 1.32 value aligns correctly to Batch 438.
As a result, we have now connected the lab measured product impurity result to each individual process batch, regardless of lab result timing, and with additional steps have aligned the Average Process Ratio and Lab Measured Product Impurity.