Categories
Getting Started with JetML Guides and Code Samples

Configuring workflows

Configuring workflows

Consistently run or share your data science project with others using workflows. Configure your workflow once and run it manually until you’re ready to automate it by setting a schedule for your workflow. 

Basic options

  •  Workspace
    • Choose a workspace for this workflow to use.
    • Workspaces can be shared with multiple people, so be careful to select the correct one for your new workflow to prevent sharing sensitive data with the wrong teammates.
    • The workspace can not be changed after creating workflows.
  • Instance Size
  • Max Run Time
    • Set the max amount of time that your workflow should run before automatically being terminated.

Advanced options

  • Notebook to run
    • Set the location of the notebook you want to run when this workflow starts.
    • Notebooks are ran in the background of your workflow instance.
    • On completion of the notebook run, a log file of your notebook run will be stored under the /notebooks/workflow-runs/ in your workspace.
  • Python packages
    • Specify your python packages similar to a requirements.txt file (pip install).
    • Separate multiple python packages with new lines.  
  • Environment variables
    • Save configuration variables and secrets as environment variables for simpler and more secure management of important settings.
    • Environment variables are stored encrypted until your workflow is ran.
    • Call environment variables from Python using
      import os and then os.environ[‘VARIABLE-NAME’].
Configuring workflows
  • Bash commands
    • Run Ubuntu Linux commands at the start of every workflow.
    • Automate advanced workflows or install missing programs using apt-get install package like commands. 

Set a schedule

  • Starts on
    • Select date and time that you want your this workflow to first run at.
  • Runs every
    • Set the delay in hours and minutes in-between your scheduled workflow runs. 
    • Workflows scheduled runs are accurate to about a minute and therefore can experience shifts overtime resulting in the workflow starting at different minute intervals over time.