Showing results for tags 'cumulative distribution function'.

Found 1 result

Sort By
- Date
- Relevancy

Create Pareto Charts with CDF in Seeq

Allison Buenemann posted a topic in Seeq Data Lab

Users are often interested in creating pareto charts using conditions they've created in Seeq sorted by a particular capsule property. The chart below was created using the Histogram tool in Seeq Workbench. For more information on how to create Histograms that look like this, check out this article on creating and using capsule properties. Often times users would like to see the histogram above, but with the bars sorted from largest to smallest in a traditional pareto chart. Users can easily create paretos from Seeq conditions using Seeq Data Lab. A preview of the chart that we can create is: The full Jupyter Notebook documentation of this workflow (including output) can be found in the attached pdf file. If you're unable to download the PDF, the code snippets below can be run in Seeq Data Lab to produce the chart above. #Import relevant libraries from seeq import spy import pandas as pd import numpy as np import matplotlib import matplotlib.pyplot as plt Log in to the SPY module if running locally using spy.login, or skip this step if running Seeq Data Lab. #Search for your condition that has capsule properties using spy.search #Use the 'scoped to' argument to search for items only in a particular workbook. If the item is global, no 'scoped to' argument is necessary condition = spy.search({ "Name": "Production Loss Events (with Capsule Properties)", "Scoped To": "9E50F449-A6A1-4BCB-830A-8D0878C8C925", }) condition #pull the data from the time frame of interest using spy.pull into a Pandas dataframe called 'my_data' my_data = spy.pull(condition, start='2019-01-15 12:00AM', end='2019-07-15 12:00AM', header='Name',grid=None) #remove columns from the my_data dataframe that will not be used in creation of the pareto/CDF my_data = my_data.drop(['Condition','Capsule Is Uncertain','Source Unique Id'], axis=1, inplace=False) #Calculate a new dataframe column named 'Duration' by subtracting the capsule start from the capsule end time my_data['Duration'] = my_data['Capsule End']-my_data['Capsule Start'] #Group the dataframe by reason code my_data_by_reason_code = my_data.groupby('Reason Code') #check out what the new data frame grouped by reason code looks like my_data_by_reason_code.head() #sum total time broken down by reason code and sort from greatest to least total_time_by_reason_code['Total_Time_by_Reason_Code'] = my_data_by_reason_code['Duration'].sum().sort_values(ascending=False) total_time_by_reason_code['Total_Time_by_Reason_Code'] = total_time_by_reason_code['Total_Time_by_Reason_Code'].rename('Total_Time_by_Reason_Code') total_time_by_reason_code['Total_Time_by_Reason_Code'] #plot pareto of total time by reason code total_time_by_reason_code['Total_Time_by_Reason_Code'].plot(kind='bar') #Calculate the total time from all reason codes total_time = total_time_by_reason_code['Total_Time_by_Reason_Code'].sum() total_time #calculate percentatge of total time from each individual reason code percent_time_by_reason_code['Percent_Time_by_Reason_Code'] = total_time_by_reason_code['Total_Time_by_Reason_Code'].divide(total_time) percent_time_by_reason_code['Percent_Time_by_Reason_Code'] #Calculate cumulative sum of percentage of time for each reason code cum_percent_time_by_reason_code['Cum_Percent_Time_by_Reason_Code'] = percent_time_by_reason_code['Percent_Time_by_Reason_Code'].cumsum() cum_percent_time_by_reason_code['Cum_Percent_Time_by_Reason_Code'] = cum_percent_time_by_reason_code['Cum_Percent_Time_by_Reason_Code'].rename('Cum_Percent_Time_by_Reason_Code') cum_percent_time_by_reason_code['Cum_Percent_Time_by_Reason_Code'] #plot cumulative distribution function of time spent by reason code cum_percent_time_by_reason_code['Cum_Percent_Time_by_Reason_Code'].plot(linestyle='-', linewidth=3,marker='o',markersize=15, color='b') #convert time units on total time by reason code column from default (nanoseconds) to hours total_time_by_reason_code['Total_Time_by_Reason_Code'] = total_time_by_reason_code['Total_Time_by_Reason_Code'].dt.total_seconds()/(60*60) #build dataframe for final overlaid chart df_for_chart = pd.concat([total_time_by_reason_code['Total_Time_by_Reason_Code'], cum_percent_time_by_reason_code['Cum_Percent_Time_by_Reason_Code']], axis=1) df_for_chart #create figure with overlaid Pareto + CDF plt.figure(figsize=(20,12)) ax = df_for_chart['Total_Time_by_Reason_Code'].plot(kind='bar',ylim=(0,800),style='ggplot',fontsize=12) ax.set_ylabel('Total Hours by Reason Code',fontsize=14) ax.set_title('Downtime Reason Code Pareto',fontsize=16) ax2 = df_for_chart['Cum_Percent_Time_by_Reason_Code'].plot(secondary_y=['Cum_Percent_Time_by_Reason_Code'],linestyle='-', linewidth=3,marker='o',markersize=15, color='b') ax2.set_ylabel('Cumulative Frequency',fontsize=14) plt.show()
- April 13, 2020
- 1 reply
- - 4
- - pareto
  - histogram
  - (and 2 more)
    Tagged with:
    
    pareto
    
    histogram
    
    cdf
    
    cumulative distribution function