PyLint

By Scott Hassan and David Jeske (also see Scott's page)

Before you get all excited, PyLint is not finished. However, it does do useful things, so try it out. [ download ]

Objective

The goal of this project is to build a static type analysis system for Python. This system should be able to:

We are not concerned with using this information to do any type of compiling or performance improvement.

Background

Python has gained a strong following for being extremely expressive and powerful. Part of this power comes from it's dynamic type system which stays out of the programmer's way. However, dynamic typing allows the programmer to enter programs which can fail in obvious ways at runtime.

There are a few different tools available which check Python programs for common errors. Pychecker is very similar to a C-lint tool, reporting bad practices in Python code. Logilab's pylint (not to be confused with the pylint you are reading about right now) checks python code to see if it meets a set of coding standards. Our type-inference based Pylint attempts to go a step further and report programmer typing errors which will fail at runtime. Because it is an analysis tool, it is completely optional, and any items it reports can be ignore.

In addition to performing static analysis, there are other reasons to use type inference to determine type information from Python programs. When reading another programmers dynamic-typed code, it is often difficult to understand what type of object a particular variable might hold. This is particularly tricky when many different types of objects are passed around.

Languages with static typechecking solve this problem by burdening the programmer with additional work while they are writing the application. While some of this work might be useful in disambuigating confusing type usage, much of it is just 'crossing I-s and dotting T-s' to make the compiler happy. PyLint offers a different method for solving the same problem. By performing analysis of modules and building up persistant type structures, Pylint can provide the programmer type information about Python code. This information might be read from auto-generated documentation, or integrated directly into IDEs or editors.

The most widely know type-inference system is the one used in the programming language ML. ML offers amazing performance, and largly type-declaration-less programming, however, it is not without drawbacks. The rigidness of the type system makes polymorphic interface use problematic, resulting in constructs such as print_string and print_int. Furthermore, the type-inference errors produced often leave programmers baffled, since the compiler does not make a decision about which part of your program is most likely wrong.

Pylint aims to support polymorphic interfaces, and provide simple "runtime backtrace" style errors, which are generally richer than compiler errors, but easy to read. While this seems like a much more complex problem than ML type-inference, especially given that Python was not designed for inferred types, Pylint is not overly complex. By relying heavily on annotated type descriptions for library modules, and by storing Pylint created type descriptions for your program, Pylint simplifies the work it has to do in any single invocation. Furthermore, we accept that PyLint will not work in all cases, and we consider it acceptable for Pylint to either require you provide type annotations, or lose type-checking capability, in cases which are too complicated for it to infer.

Overview

PyLint begins with a set of type annotations for the system libraries. We use type-annotations for a few reasons. First off, much of the standard Python library is implemented in C-modules for which Python type-inference will not work. Second, it is not possible to perform type inference on libraries without code that calls those libraries. Lastly, PyLint, unlike many other inference systems, is not agnostic about which code might be wrong. We expect that library code does not change often and thus its types can be reasonably annotated. If there is a type conflict between application code and library code, we would like to clearly explain the error in the application code.

Before continuing, we need to define some terms which will be used:

The PyLint process is divided into three stages, Parsing, TypeSite Description, and Inference Type-Check.

Stage 1, Parsing

In the parse stage, we must be able to walk an AST for a full-python program and build a type-inference graph. This graph should record occurances of types at different type-sites.

Stage 2, TypeSite Description

After parsing the program and building the inference graph, we should be able to dump out a list of all the types which are known to appear at a given type site. For example, given a module:
   def a(an_arg):
     print an_arg

   a(1)                # pass in a number
   a("name")           # pass in a string
   a(open("foo","w"))  # call the builtin open and pass in a File object
We should be able to print out a type signature for the method "a" similar to:
     Any <- a(int)
     Any <- a(string)
     Any <- a(Obj<__builtin__.File>)
This will have two main uses:
  1. It can be diffed against a previous listing to check changes in type signatures
  2. it can be loaded into an editor in a ctags like manner to allow programmers to ask the editor for the "possible types" of an argument or variable, and thus "look up" the implementation of that type.

Stage 3, Inference Type-Check

It is theorized that after the parse stage is done, the information collected can then be re-analyzed to determine both type violations (i.e. TTypes with invalid signatures appearing at a TSite such as variables, or function arguments), and type-risks (i.e. extremely-polymorphic use of objects at a TSite)

In addition, a syntax for describing static protocol-conformance or type-conformance can easily be developed which this tool will understand and check. For example, this directive:

   def a(an_arg):
     "PYLINT: None <- (int)"
     print an_arg
Might be used to explain that the only valid type of "an_arg" is "int" and that the function should always return None. This could incrementally be converted to be similar to one of the proposed static type check syntax for python. For example:
   def a:None(an_arg:int)
     print an_arg
Merely by allowing these type declaration terms to be ignored by the existing Python interpreter.

In the current implementation, type annotations are supplied by way of an example. That example may be fully within a docstring, or in the case of complex cases, in the form of a code example. Below you can see the file which initializes the type annotations for the Python file open() and standard file interface. Notice that basic types are provided directly in the example, however, the open() implementation instantiates the __PythoneFile__ object and returns it.

def int(x):
  'PYLINT (0,), "x", (1.0,1,"",1L)'

def abs(x):
  'PYLINT (1.0, 1, 1L), "x", (1.0,1,1L)'

class __PythonFile__:
  def read(self, n):
    'PYLINT ("",),"n",(1,)'
  def write(self, s):
    'PYLINT (0,),"s",("",)'
  def close(self):
    'PYLINT (None,),'
  def fileno(self):
    'PYLINT (0,),'
  def readlines(self, n):
    'PYLINT ([""],),"n",(1,)'
  def readline(self):
    'PYLINT ("",),'

def open(fn):
  'PYLINT None,"fn",("",)'
  fp = __PythonFile__()
  return fp

Usage Details

This section explains a bit of what you'll see if you run pylint. The simplest test is test1.py:
  a = 1
If you use pylint to analyze the program, you will see alot of output. Most of this is debugging code which is explaining pylint's processing of the __builtins__.py type information. Skip down to below the horizontal line and you will see the processing of this program.

As you can see, pylint identified one type-site, named "a", and that it is a number.

Lets jump to the most complex example, test15.py. This test is also fairly simple, but it makes use of a module import, classes, and many of the fancy things that PyLint can do:

import test14

c15 = test14.C()
a15 = c15.m1()
The imported test14.py test is equally simple:
class C:
  def m1(self, a=5):
    "test of default parameters"
    return a

c = C()
aa = c.m1()
bb = c.m1(5)
Running pylint.py against test15.py will produce copious output, all of which just verifies that it is a typecorrect program. Things get more interesting if we add a single non-type correct line. Let's add the red line below:
import test14

c15 = test14.C()
a15 = c15.m1()

b15 = c15.does_not_exist()

A programmer can look at the code above and easily realize that c15 is an instance of class test14.C, and that there is no does_not_exist method in that class. However, in Python programs this is a runtime error. However, PyLint can analyze the program and report the error. If we just grep for the ERROR lines, you will see the following:


While this is a far cry from our goal of nicely understandable error messages, we hope that this shows a glimpse of the potential of PyLint.

Implementation Details

Extensive documentation is supplied in the pylint program itself. Refer to the documentation comments there for more detailed information about the program operation.