Related Work¶
Related Projects¶
- Bandit
- Bandit was a tool made by a few people at OpenStack with the same purpose of PyT in mind, the main difference is that Bandit doesn’t track the flow of data and PyT does, so it’s closer to a grep ish pre-commit hook to e.g. ban urllib2 and open etc. and suggest Advocate and a secure open wrapper instead. The sinks, formatters and UI are where it shines.
- JunkHacker
- Written by Kevin Hock and had an incredibly bad design, it analyzed Python bytecode using equip and tracked taint by depth-first searching through basic block’s via a bytecode interpreter heavily adopted from Byterun. It dealth with path-explosion via a crazy buddy system and being that many byecode instructions e.g. exceptions do not work on the basic block level, work arounds were gruesome. The buddy system worked by marking each node that diverged with the node that it’s children converged at. This was both less efficient than unioning predecessors like PyT and more complicated. It is not open-source because of how ugly it is.
- Focuson
- Written by Collin Greene of Uber, similar to PyT it uses the ast module but unlike PyT it tracks dataflow using path-insensitive backwards slicing. Path explosion is not a problem because it is path-insensitive, but that causes it to have more false-positives than PyT.
- PyExZ3
- A dynamic symbolic execution framework for Python, potentially useful for taint tracking if it can solve string constraints, which there is experimental support for in a fork. “A novel aspect of the rewrite is to rely solely on Python’s operator overloading to accomplish all the interception needed for symbolic execution.” Joseph Near did this before them, but it is interesting work nevertheless.
- DARLAB Work
- Has great alias analysis work, by Michael Gorbovitski et al. It would be quite performance intensive to add to a security tool and may or may not be that helpful for reducing false positives, but is quite impressive work regardless.
- RIPS (PHP)
- The latest versions, the useful ones, are closed-source, as the author Johannes Dahse has gone commercial. This is unfortunate and it seems like the most advanced tool in this category as far as we know because it can find second order vulnerabilities. The old unsophisticated open-source version is here.
- Brakeman (Rails)
- Written by Justin Collins, it is written in Ruby and made for Rails. I’m not exactly sure how it works, but it does do something like reaching definitions.
- Dawnscanner (Ruby)
- Written by Paolo Perego, I’m not exactly sure how it works.
- Joseph Near et al. (Rails)
- Joseph has a lot of interesting work I would like to summarize.
Related Papers¶
Schwarzbach static analysis notes The PyT thesis is heavily influenced by these notes, they’re a pretty good resource for learning dataflow analysis. Other good resources include Engineering a Compiler, Advanced Compiler Design and Implementation and Data Flow Analysis: Theory and Practice.
- Static Detection of Second Order Vulnerabilities in Web Applications
A simple intuitive idea, but complex to implement. Unlike PyT they use summaries instead of inlining, summaries are sort of required to implement the idea, unless you wrote results somewhere and ran the tool again with those results in a dirty hack. The main hard-parts with implementing this idea with PyT will be (1) re-writing to use summaries, (2) writing code that deals with this part of the paper “SQL has different syntactical forms of writing data to a table. Listing 1 shows three different ways to perform the same query”. Aside from the examples given in the paper, some other examples of multi-step exploits are as follows. Tracking from bad RNG to store in location A to HTTP response, then seeing where a taint value is checked against location A.
In my opinion, the best ROI in the Python world would be to implement this for the Django ORM or SQLAlchemy since they seem to be the most widely used.
Finding Security Bugs in Web Applications using a Catalog of Access Control Patterns
- Practical Static Analysis of JavaScript Applications in the Presence of Frameworks and Libraries
There might be 3 ways of handling blackbox calls between source and sink, to basically answer the questions that a proper summary does, e.g. if argument A is tainted, does this call return a tainted value? This can be dealth with via (1) hard-coded mapping, (2) pip install, see if Python code or, (3) possibly this paper. I suspect long-term, some combination of 1 and 2 will be done with PyT. If we just ask the user, “Hey, does this call propagate taint?” and we remember the answer, it would be easy enough for the user to use the tool.