FAQ#

Why is the library named Sheepdewg?#

It’s a lovable pup. The starting motivation behind Sheepdewg, as a fork of Tango, is just to learn something about the larger process of using Python packages such as Sphinx or Jupyter to develop an AI literature classifier … at first, we don’t expect too much from the new Dewg, ie, we are just tickled to now end to have the little guy and we have not even really started even trying to teach our new herding puppy to behave himself … but long before our Sheepdewg is useful, we will have learned more than a few things about what it’s like to train a new puppy.

You probably want to start with the Tango CLI?#

Think about how you can debug your own steps by running the tango command through pdb. For example:

python -m pdb -m tango run config.jsonnet

How is Tango different from Metaflow, Airflow, or redun?#

We’ve found that existing DAG execution engines like these tools are great for production workflows but not as well suited for messy, collaborative research projects where code is changing constantly. AI2 Tango was built specifically for these kinds of research projects.

How does Tango’s caching mechanism work?#

AI2 Tango caches the results of steps based on the unique_id of the step. The unique_id is essentially a hash of all of the inputs to the step along with:

the step class’s fully qualified name, and
the step class’s VERSION class variable (an arbitrary string).

Unlike other workflow engines like redun, Tango does not take into account the source code of the class itself (other than its fully qualified name) because we’ve found that using a hash of the source code bytes is way too sensitive and less transparent for users. When you change the source code of your step in a meaningful way you can just manually change the VERSION class variable to indicate to Tango that the step has been updated.