Seeq Team Joe Reckamp Posted April 13, 2020 Seeq Team Share Posted April 13, 2020 In batch processes, genealogy of lots of batch IDs is important to understand. When a deviation occurs in a raw material or batch, the use of that material in subsequent steps of the process needs to be quickly identified. In the case of pharmaceuticals, batches that use that product may need to be pulled from the shelves immediately and not used by consumers. Batch Genealogy starts with having data in a format where the input material lot numbers or Batch IDs are tracked for each batch. In Seeq, these input material lot numbers or Batch IDs are tracked as properties attached to each Batch capsule. The properties can either be set up from the datasource connector (such as bringing in OSIsoft PI Event Frames or information from a SQL database) or they can be built using transforms in Seeq Formula. Once the data is set up properly, visualization can be achieved with Seeq Data Lab. This article will describe one approach to Batch Genealogy visualization in Seeq Data Lab. In this particular scenario, we are looking at 6 distinct batch chemical transformations to get to the final product and we are tracking the key raw material as it goes through the process. 1. First, use spy.search to find the batches of interest: results = spy.search({ "Name": "Step * Batches" },workbook='3FC99BE8-F721-48E1-ACC3-18751ADA549F') # Print the output to the Jupyter page results 2. Next, use spy.pull to retrieve the records for a specific time range of interest: step2_batches = results.loc[results['Name'].isin(['Step 2 Batches'])] step2_data = spy.pull(step2_batches, start='2019-01-01', end='2019-07-01', header='Name') step2_data 3. In the previous step, you can see that some batches have multiple input Batch IDs. In order to track all combinations of the input batch IDs and output batch IDs, use pandas DataFrame manipulations to get a new DataFrame with just the Input Batch ID and Output Batch ID combinations. step2_1 = step2_data[['Step 1 Input Batch ID (1)', 'Batch ID']].rename(columns={'Step 1 Input Batch ID (1)': 'Step 1 Input Batch ID'}) step2_2 = step2_data[['Step 1 Input Batch ID (2)', 'Batch ID']].rename(columns={'Step 1 Input Batch ID (2)': 'Step 1 Input Batch ID'}) step2 = pd.concat([step2_1,step2_2]).rename(columns={'Batch ID': 'Step 2 Batch ID'}).set_index('Step 2 Batch ID') step2 = step2.loc[(step2!='0.0').any(1)] step2 You may notice that instead of 6 rows in the previous DataFrame, we now have 11 rows (5 batches with 2 input batch IDs + 1 batch with 1 input batch ID = 11 total combinations). 4. Repeat the above steps for each additional step in the process, creating a final DataFrame with Step 1 through Step 6 batches of all combinations of input Batch IDs from prior steps. step5_6 = pd.merge(step6.reset_index(),step5.reset_index().rename(columns={'Step 5 Batch ID': 'Step 5 Input Batch ID'})) step4_5_6 = pd.merge(step5_6,step4.reset_index().rename(columns={'Step 4 Batch ID': 'Step 4 Input Batch ID'})) step3_4_5_6 = pd.merge(step4_5_6,step3.reset_index().rename(columns={'Step 3 Batch ID': 'Step 3 Input Batch ID'})) FinalDf = pd.merge(step3_4_5_6,step2.reset_index().rename(columns={'Step 2 Batch ID': 'Step 2 Input Batch ID'})) FinalDf 1 Link to comment Share on other sites More sharing options...
Seeq Team Joe Reckamp Posted April 13, 2020 Author Seeq Team Share Posted April 13, 2020 5. Use a Sankey visualization to view the relationships between batches. In order to use the Sankey visualization, first an additional column will need to be added to describe the relative width of each of the combinations. In this example, we are just going to add a column that equally weights each of the visualizations. However, if input weights or mass fractions were available for each combination, that could be used in this column instead. Finally, call the Sankey visualization code taken from an example website and modify functions slightly to match DataFrame set up and desired colors. FinalDf['Count']=1 def genSankey(df,cat_cols=[],value_cols='',title='Sankey Diagram'): # maximum of 6 value cols -> 6 colors colorPalette = ['#FFD43B','#306998','#23B9F2','#089474','#646464','#F23E29'] labelList = [] colorNumList = [] for catCol in cat_cols: labelListTemp = list(set(df[catCol].values)) colorNumList.append(len(labelListTemp)) labelList = labelList + labelListTemp # remove duplicates from labelList labelList = list(dict.fromkeys(labelList)) # define colors based on number of levels colorList = [] for idx, colorNum in enumerate(colorNumList): colorList = colorList + [colorPalette[idx]]*colorNum # transform df into a source-target pair for i in range(len(cat_cols)-1): if i==0: sourceTargetDf = df[[cat_cols[i],cat_cols[i+1],value_cols]] sourceTargetDf.columns = ['source','target','count'] else: tempDf = df[[cat_cols[i],cat_cols[i+1],value_cols]] tempDf.columns = ['source','target','count'] sourceTargetDf = pd.concat([sourceTargetDf,tempDf]) sourceTargetDf = sourceTargetDf.groupby(['source','target']).agg({'count':'sum'}).reset_index() # add index for source-target pair sourceTargetDf['sourceID'] = sourceTargetDf['source'].apply(lambda x: labelList.index(x)) sourceTargetDf['targetID'] = sourceTargetDf['target'].apply(lambda x: labelList.index(x)) # creating the sankey diagram data = dict( type='sankey', node = dict( pad = 15, thickness = 20, line = dict( color = "black", width = 0.5 ), label = labelList, color = colorList ), link = dict( source = sourceTargetDf['sourceID'], target = sourceTargetDf['targetID'], value = sourceTargetDf['count'] ) ) layout = dict( title = title, font = dict( size = 10 ) ) fig = dict(data=[data], layout=layout) return fig fig = genSankey(FinalDf,cat_cols=['Step 1 Input Batch ID','Step 2 Input Batch ID','Step 3 Input Batch ID','Step 4 Input Batch ID','Step 5 Input Batch ID','Step 6 Batch ID'],value_cols='Count',title='Batch Genealogy') plotly.offline.plot(fig, validate=False) In this visualization, each different colored column represents a step in the process (yellow = Step 1, dark blue = Step 2, etc.) with the batch ID of the step denoted in the text next to it. The gray connections represent the various combinations of where that Batch ID was used in the subsequent step. With the above code, the Sankey visualization will open in a new tab and using Plotly, the visualization will be interactive. Therefore, when you highlight a specific batch, it will highlight its dependencies so you can quickly see which batches are related to that particular batch highlighted. 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now