Jump to content

Generating Formulas with DataLab as a form of Calculation Templatization


KantianKinetics

Recommended Posts

FYI, I'm very much self-taught in Python and don't use it more than a few times a month. 

I work as a technology licensor and our customers have provided datasets for us to work on.  Most of our process units have enough similarities that we can templatize SEEQ formulas and then SEEQ workbooks.  Thus once I've completed the following work for one customer, I'm 80% of the way done for the next 100+ customers. 

I'm struggling to get the formulas pushed back into SEEQ from the DataLab.  

I've imported CSVs of data into SEEQ until we get our PI-to-PI connect setup to pull directly from our customer to our PI Asset Framework and then pushed to our SEEQ instance.  

Since generating the template formulas is MUCH easier in Excel, I've created all the "Name", "Formula", "Formula Parameters" as strings in an Excel workbook and imported it as a Dataframe "calc_signals" into DataLab. 

Here's the relevant code:  

image.png.bbbe67021061e7c2b8cd257db4d9f0a8.png

 

image.thumb.png.cb94bcd136eebccdaeab9ebcf69c8866.png

image.thumb.png.fc316c028443c286508890a0e818cbcc.png

 

 

As you can see in Row 1, The Formula Parameter array is empty for the signal $OFPOF.  This is the formula generated for the row above, Row 0.  This is because I haven't pushed it to SEEQ.  You can see the error in the image below.  Do I need to actually push each row to SEEQ and then search again to get this to work right?  I don't have a huge set of calculations, but it seems there should be a better way.  

image.thumb.png.3ce1800dfbec6c40f036f4fa2f7b9ecc.png

 

Once I sort through this error, I see that I will have troubles later since SEEQ automatically generates the signal variable name "$OFPOF".  Is there a way to override existing variable names and assign new calculations a variable name of my choice?

Edited by KantianKinetics
Link to comment
Share on other sites

  • Seeq Team
On 10/27/2023 at 4:37 AM, KantianKinetics said:

As you can see in Row 1, The Formula Parameter array is empty for the signal $OFPOF.  This is the formula generated for the row above, Row 0.  This is because I haven't pushed it to SEEQ.  You can see the error in the image below.  Do I need to actually push each row to SEEQ and then search again to get this to work right?  I don't have a huge set of calculations, but it seems there should be a better way.  

 

To ensure a successful metadata push, the Formula Parameter array must contain the actual ID of the item in Seeq. In your screenshot, row 0 failed to push because the Formula Parameter did not contain the actual ID for each item. You cannot proceed with just {'$OFCoC2'=['ID'], ...}. Instead, you need to specify the ID for each item, for example {$OFCoC2=['0EE76D39-3D08-FF40-BBFF-53255CCEB514'], ...}. You can use spy.search() to obtain the ID of each item in your formula and map those IDs to the corresponding Formula Parameter. Therefore, you need to push row 0 to Seeq to get the ID of the item and map the ID to row 1 before you can push row 1.

 

On 10/27/2023 at 4:37 AM, KantianKinetics said:

Once I sort through this error, I see that I will have troubles later since SEEQ automatically generates the signal variable name "$OFPOF".  Is there a way to override existing variable names and assign new calculations a variable name of my choice?

When working with formulas, you are free to choose any variable name you prefer, as long as the correct ID is mapped under the Formula Parameter. For instance, for row 1, you can use $a/$b. The only requirement is that the ID of $a and $b is correctly set. For example, {'$a'='0EE76D39-3D08-FF40-BBFF-53255CCEB514', '$b'='0EE76D39-3D08-FF40-BBFF-53255CCEB55'}.

Link to comment
Share on other sites

Thanks for the response! 

Quote

To ensure a successful metadata push, the Formula Parameter array must contain the actual ID of the item in Seeq. In your screenshot, row 0 failed to push because the Formula Parameter did not contain the actual ID for each item. You cannot proceed with just {'$OFCoC2'=['ID'], ...}. Instead, you need to specify the ID for each item, for example {$OFCoC2=['0EE76D39-3D08-FF40-BBFF-53255CCEB514'], ...}. You can use spy.search() to obtain the ID of each item in your formula and map those IDs to the corresponding Formula Parameter. Therefore, you need to push row 0 to Seeq to get the ID of the item and map the ID to row 1 before you can push row 1.

Ah I think I see the problem.  Not quite sure how to solve it.  I probably didn't provide enough information for you.  I'm not simply pasting ['ID'] into the Formula Parameter column, the code it is trying to evaluate a string that references the search_results 'Asset'

The eval() function worked strangely.  As you can see from the above code, I did run spy.search() and generated the search_results dataframe which I passed to df. 

The Excel CSV has about 300 calculations in it with columns "Name", "Type", "Formula", and "Formula Parameters".  See below: 

image.thumb.png.0c295045460d8607569400c3f77f78b8.png

 The "Formula Parameters" column contains a string of format below in the code snippet for each formula i want to generate.  I perform eval() on the string once its read into python to grab the ID of the search_results (now df) from SEEQ.  Obviously my SEEQ data is in the imported CSV file asset = 'advisor_elements_sOiltags_leastCleaned_v2_csv(import)':  

{
	'$OFCoC2':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Olefin Feed Composition_online C3=')],
	'$OFCoC3':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Olefin Feed Composition_online C4=1')],
	'$OFCoC5':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Olefin Feed Composition_online C4=2c')],
	'$OFCoC6':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Olefin Feed Composition_online C4=2t')],
	'$OFCoC7':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Olefin Feed Composition_online C4=i')],
	'$FaPOF':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Feed and Products Olefin Feed+HYC')]
}

Why would the eval() function return ['ID'] rather than the actual ID?  I've also tried to instead grab only the ['ID'] column after filtering along ['Asset'] and ['Name'] columns.

When I simply copy and paste the the above code into Data Lab it works just fine to to push Row 0 into SEEQ, but when I evaluate the string in the data table generated from my excel file, it doesn't seem to work.  

Edited by KantianKinetics
Link to comment
Share on other sites

  • Seeq Team
12 hours ago, KantianKinetics said:

 The "Formula Parameters" column contains a string of format below in the code snippet for each formula i want to generate.  I perform eval() on the string once its read into python to grab the ID of the search_results (now df) from SEEQ.  Obviously my SEEQ data is in the imported CSV file asset = 'advisor_elements_sOiltags_leastCleaned_v2_csv(import)':  

{
	'$OFCoC2':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Olefin Feed Composition_online C3=')],
	'$OFCoC3':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Olefin Feed Composition_online C4=1')],
	'$OFCoC5':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Olefin Feed Composition_online C4=2c')],
	'$OFCoC6':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Olefin Feed Composition_online C4=2t')],
	'$OFCoC7':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Olefin Feed Composition_online C4=i')],
	'$FaPOF':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Feed and Products Olefin Feed+HYC')]
}

Your code should return the actual ID of the item after adding ['ID].iloc[0] to the end of each line. Please try the "Formula Parameter" below:

{
	'$OFCoC2':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Olefin Feed Composition_online C3=')]['ID'].iloc[0],
	'$OFCoC3':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Olefin Feed Composition_online C4=1')]['ID'].iloc[0],
	'$OFCoC5':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Olefin Feed Composition_online C4=2c')]['ID'].iloc[0],
	'$OFCoC6':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Olefin Feed Composition_online C4=2t')]['ID'].iloc[0],
	'$OFCoC7':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Olefin Feed Composition_online C4=i')]['ID'].iloc[0],
	'$FaPOF':df[(df['Asset'] =='advisor_elements_sOiltags_leastCleaned_v2_csv(import)') & (df['Name'] == 'Feed and Products Olefin Feed+HYC')]['ID'].iloc[0]
}

After that, the spy.push() should run without error. Let me know the outcome after trying it. 

Link to comment
Share on other sites

Thanks.  I did try this previously - albeit without the .iloc[0] because the ['Name'] filter handles the row selection - and it returns an error stating "['ID']" not found in the DataFrame or something like that.  It makes no sense.  It obviously should be grabbing the ID but it doesn't.  It makes me think SPy is doing some kind of manipulation of the DataFrame during the push?

I did get my code to work with some revisions but I still cannot directly filter for ['ID'] and must push the entire row including the other columns.  

Here is the working version if you're interested.

##### Search for SEEQ signals from Imported CSV File datasource.  Search only for signals in specific Workbook to eliminate need to call out Asset in later code.

spy_results_original = spy.search({'Datasource Name':'Imported CSV Files'}, workbook=customer_name)

# Copying spy_results to a random df so we can work on it without affecting the original
df = spy_results_original.drop([0,1]).reset_index(drop=True) # Dropping row tied to the entire file rather than data in the file
df = df[df['Asset'].notnull()] #Dropping null values in the "Asset" column, these are also related to entire file

##### Read Excel file to build templatized calculation dataframe

import math
import ast
from seeq import spy
import pandas as pd

# Must have or else it reads $ signs for SEEQ variables as LaTeX...grrrr
pd.options.display.html.use_mathjax = False 

# display entire string in Jupyter for easier viewing.  Comment this out if you don't want it.
pd.set_option('display.max_colwidth', None) 

calc_signals = pd.read_excel('Templatization/SEEQ Tag Calcs for SEEQ (Python).xlsm', sheet_name='ForSEEQ', header=0)
calc_signals = calc_signals[calc_signals['Formula Parameters'] != 'SKIP']
calc_signals['Formula Parameters'] = calc_signals['Formula Parameters'].apply(evaluate_parameters) 

### Create Empty signal dataframe to push empty Calculated Signals  into SEEQ and later search for the Calculated Signal IDs
calc_signals_empty = calc_signals.drop(['Formula','Formula Parameters', 'Units'],axis=1)
signal_push = spy.push(workbook = customer_name, metadata=calc_signals_empty)

##### Re-Search for signals to get all signals  

### probably a less error prone way to do this that just grabbing all Seeq Data Lab signals. but it works for now
spy_results_calcs = spy.search({'Datasource Name':'Seeq Data Lab'}, workbook=customer_name) 

# Copying spy_results to a random df so we can work on it without affecting the original
df_calcs = spy_results_calcs

# Append original df with df_calcs to include all signals.
df = df.append(df_calcs).reset_index(drop=True)

##### Re-running this again to make sure nothing is left over from previous.  
##### Then evaluating parameters again now that we have "ID"s of calculated signals in 'df'
##### The excel file 'Formula' and 'Formula Parameters' column has following format.  For some reason, adding ['ID'] to end of filtered Dataframe returns an error of 'ID' not found:
# ($OFCoC2 + $OFCoC3 + $OFCoC5 + $OFCoC6 + $OFCoC7) / 100 * $FaPOF
# {'$OFCoC2':df[df['Name'] == 'Olefin Feed Composition_online C3='],'$OFCoC3':df[df['Name'] == 'Olefin Feed Composition_online C4=1'], .....}

calc_signals = pd.read_excel('Templatization/SEEQ Tag Calcs for SEEQ (Python).xlsm', sheet_name='ForSEEQ', header=0)
calc_signals = calc_signals[calc_signals['Formula Parameters'] != 'SKIP']
calc_signals['Formula Parameters'] = calc_signals['Formula Parameters'].apply(evaluate_parameters) 

##### Successful push.  Need to clean up the Excel file a bit for error handling.
signal_push = spy.push(workbook = customer_name, metadata=calc_signals.drop("Units",axis=1))

 

Current Error:

I'm now getting error on push: 

Quote

SPy Error: Error processing OFPOF: The parameter had multiple entries in the DataFrame Error found at line 1 in cell 171.

Unsure why, since the Formula shouldn't care if the signal variable name is reused.  I was told the variables don't work globally, e.g. either of the following would work as long as they are tied appropriately to the formula parameter.  Is this not true?  

($OFCoC2 + $OFCoC3 + $OFCoC5 + $OFCoC6 + $OFCoC7) / 100 * $FaPOF

($a + $b + $c + $d +$e) / 100 * $g

 

Edited by KantianKinetics
Link to comment
Share on other sites

  • Seeq Team
16 hours ago, KantianKinetics said:

Current Error:

I'm now getting error on push: 

Quote

SPy Error: Error processing OFPOF: The parameter had multiple entries in the DataFrame Error found at line 1 in cell 171.

Unsure why, since the Formula shouldn't care if the signal variable name is reused.  I was told the variables don't work globally, e.g. either of the following would work as long as they are tied appropriately to the formula parameter.  Is this not true?  

($OFCoC2 + $OFCoC3 + $OFCoC5 + $OFCoC6 + $OFCoC7) / 100 * $FaPOF

($a + $b + $c + $d +$e) / 100 * $g

Yes, either of the formula parameter combinations will work for your case. 

The error message you received, SPy Error, indicates that there were multiple entries of $OFPOF in the DataFrame. This suggests that there might be more than one match found in your df. To confirm this, could you please print df and check?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...