- Bandit was a tool made by a few people at OpenStack with the same purpose of PyT in mind, the main difference is that Bandit doesn’t track the flow of data and PyT does, so it’s closer to a grep ish pre-commit hook to e.g. ban urllib2 and open etc. and suggest Advocate and a secure open wrapper instead. The sinks, formatters and UI are where it shines.
- Written by Kevin Hock and had an incredibly bad design, it analyzed Python bytecode using equip and tracked taint by depth-first searching through basic block’s via a bytecode interpreter heavily adopted from Byterun. It dealth with path-explosion via a crazy buddy system and being that many byecode instructions e.g. exceptions do not work on the basic block level, work arounds were gruesome. The buddy system worked by marking each node that diverged with the node that it’s children converged at. This was both less efficient than unioning predecessors like PyT and more complicated. It is not open-source because of how ugly it is.
- A dynamic symbolic execution framework for Python, potentially useful for taint tracking if it can solve string constraints, which there is experimental support for in a fork. “A novel aspect of the rewrite is to rely solely on Python’s operator overloading to accomplish all the interception needed for symbolic execution.” Joseph Near did this before them, but it is interesting work nevertheless.
- RIPS (PHP)
- The latest versions, the useful ones, are closed-source, as the author Johannes Dahse has gone commercial. This is unfortunate and it seems like the most advanced tool in this category as far as we know because it can find second order vulnerabilities. The old unsophisticated open-source version is here.
- Joseph Near et al. (Rails)
- Joseph has a lot of interesting work I would like to summarize.
Schwarzbach static analysis notes The PyT thesis is heavily influenced by these notes, they’re a pretty good resource for learning dataflow analysis. Other good resources include Engineering a Compiler, Advanced Compiler Design and Implementation and Data Flow Analysis: Theory and Practice.
- Static Detection of Second Order Vulnerabilities in Web Applications
A simple intuitive idea, but complex to implement. Unlike PyT they use summaries instead of inlining, summaries are sort of required to implement the idea, unless you wrote results somewhere and ran the tool again with those results in a dirty hack. The main hard-parts with implementing this idea with PyT will be (1) re-writing to use summaries, (2) writing code that deals with this part of the paper “SQL has different syntactical forms of writing data to a table. Listing 1 shows three different ways to perform the same query”. Aside from the examples given in the paper, some other examples of multi-step exploits are as follows. Tracking from bad RNG to store in location A to HTTP response, then seeing where a taint value is checked against location A.
In my opinion, the best ROI in the Python world would be to implement this for the Django ORM or SQLAlchemy since they seem to be the most widely used.
There might be 3 ways of handling blackbox calls between source and sink, to basically answer the questions that a proper summary does, e.g. if argument A is tainted, does this call return a tainted value? This can be dealth with via (1) hard-coded mapping, (2) pip install, see if Python code or, (3) possibly this paper. I suspect long-term, some combination of 1 and 2 will be done with PyT. If we just ask the user, “Hey, does this call propagate taint?” and we remember the answer, it would be easy enough for the user to use the tool.