Chao's Random Thoughts

Keep looking, don't settle

SICP Section 1.1

Computational processes are abstract beings that inhabit computers. As they evolve, processes manipulate other abstract things called data. The evolution of a process is directed by pattern of rules called a program.

Why use Lisp for the book?

The most significant unique feature of Lisp is the fact that Lisp descriptions of processes, called procedures, can themselves be represented and manipulated as Lisp data. The importance of this is that there are powerful program-design techniques that rely on the ability to blur the traditional distinction between "passive" data and "active" processes.

The three mechanisms of combining simple ideas to form more complex ideas in any powerful programming languages:

  1. primitive expressions
  2. means of combination
  3. means of abstraction

Coursera Online Functional Programming Course - a Retrospective

Here comes the summary of the Coursera course Fucntional Programming Principles in Scala It’s a little bit late, it has been two months since I finished the course. This is the first Coursera course I followed from the very beginning to the end and accomplished all the programming assignments with full scores, which helped me to get the certificate with distinction.

/images/posts/coursera_scala_certificate.jpg

First the excellent parts of the course:

  • Martin Odersky is a pretty good teacher, and all the lectures are well designed. I enjoyed watching the lectures a lot and most of which I have read more than once.
  • The programming assignments are well structured, with detailed instructions and step by step guide. In each of the assignments, the whole task is split into several steps with approprivate level of abstractions.
  • There are also many useful supporting materials like scala style guide, sbt tutorial, instructions on tools setup for the course, which made it easier to concentrate on the course content. For me, I don’t have scala develop environment setup before the course, and by following the tools setup section in only took me less than half an hour to get everything ready for trying the example code and the assignments.

As for the less-good things, I would say:

  • Both the lectures and the assignments are quite easy for an experienced programmers (No functional programming background needed), I was expecting more challenging stuff.
  • There are no official solutions distributed. (The course will be offered again some time later) However, I still think for those students who passed the course should be qualified to get the solutions so that they can compare those with their own to see where they can still improve.
  • It’s a pity that this is only the first half of an advanced undergraduate course that Martin taught on campus. I am interested in the other half.

After taking this cousre, I got a deeper understanding of the functional programming basics and it made me feel more comfortable while picking up SICP again (after 3 years). Now, I am convinced that I am able to go through SICP and finish most of the exercises. Also, I had a firmer grasp of Scala even though I didn’t write more than 100 lines of Scala before; I understand more about the scala syntax, idioms and even the motivations behind some of the language structures. E.g., call by name/value, lazy evaluation, currying, pattern matching and so on. I will publish my detailed notes on the lectures and assignments to this blog later.

To conclude, I really enjoyed taking the course and many thanks to Martin , the TAs, and also the coursera staff for offering such a wonderful course.

Python Abc

Introduction

This module provides the infrastructure for defining abstract base classes (ABCs) in Python. The ABCs define a minimal set of methods that establish the characteristic behavior of the type. For more details about this, see PEP 3119.

Highlights

The module provides a metaclass used to create ABCs. An ABC can be subclassed directly. The class also has a ‘register’ method to register unrelated concrete classes (including built-in classes) and unrelated ABCs as ‘virtual subclasses’

Also there are two decorators abstractmethod and abstractproperty, which will set the function object’s attribute ‘__isabstractmethod__’ to True. Only when all of the abstract methods and abstract properties are overriden, can a class that has a metaclass derived from ABCMeta be instantiated.

Code comments

ABCMeta.__new__

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def __new__(mcls, name, bases, namespace):
    cls = super(ABCMeta, mcls).__new__(mcls, name, bases, namespace)
    # Compute set of abstract method names
    abstracts = set(name
                 for name, value in namespace.items()
                 if getattr(value, "__isabstractmethod__", False))
    for base in bases:
        for name in getattr(base, "__abstractmethods__", set()):
            value = getattr(cls, name, None)
            if getattr(value, "__isabstractmethod__", False):
                abstracts.add(name)
    cls.__abstractmethods__ = frozenset(abstracts)

    # chaoc: caches are the typical usages of weak references

    # Set up inheritance registry
    cls._abc_registry = WeakSet()
    cls._abc_cache = WeakSet()
    cls._abc_negative_cache = WeakSet()
    cls._abc_negative_cache_version = ABCMeta._abc_invalidation_counter
    return cls
  • It first creates a ‘type’ object cls. (super(ABCMeta, mcls) is ‘type’)
  • Iterate through all the attributes (including all the attributes inherited from all the bases), if any of them have ‘__isabstractmethod__’ set to true, add it to cls’s __abstractmethods__.
  • Initialize the attributes ‘_abc_registry’, ‘_abc_cache’, ‘_abc_negative_cache’ and ‘_abc_negative_cache_version’, which are used to speed up the check in __instancecheck__ and __subclasscheck__.

ABCMeta.register(cls, subclass)

See the added comments in line

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def register(cls, subclass):
    """Register a virtual subclass of an ABC."""
    # chaoc: sanity check to make sure that subclass is class type
    # either a type object (for new style classes) or the type is
    # the same as ClassType(for old style classes)
    if not isinstance(subclass, (type, types.ClassType)):
        raise TypeError("Can only register classes")
    if issubclass(subclass, cls):
        return  # Already a subclass
    # Subtle: test for cycles *after* testing for "already a subclass";
    # this means we allow X.register(X) and interpret it as a no-op.
    if issubclass(cls, subclass):
        # This would create a cycle, which is bad for the algorithm below
        raise RuntimeError("Refusing to create an inheritance cycle")
    cls._abc_registry.add(subclass)
    ABCMeta._abc_invalidation_counter += 1  # Invalidate negative cache

ABCMeta.__instancecheck__

See the added comments in line

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def __instancecheck__(cls, instance):
    """Override for isinstance(instance, cls)."""
    # Inline the cache checking when it's simple.
    subclass = getattr(instance, '__class__', None)
    if subclass is not None and subclass in cls._abc_cache:
        return True
    subtype = type(instance)
    # Old-style instances
    if subtype is _InstanceType:
        subtype = subclass
    # chaoc: subtype will also be subclass for old style classes
    # as assigned in the above step
    if subtype is subclass or subclass is None:
        # chaoc: check if the negative cache is safe to use or not
        if (cls._abc_negative_cache_version ==
            ABCMeta._abc_invalidation_counter and
            subtype in cls._abc_negative_cache):
            return False
        # Fall back to the subclass check.
        return cls.__subclasscheck__(subtype)
    return (cls.__subclasscheck__(subclass) or
            cls.__subclasscheck__(subtype))

ABCMeta.__subclasscheck__

The code and comment in this function is very clear and straightforward.

Just make sure the different cases needed to check:

  1. check the subclass hook
  2. check if it’s a direct subclass through __mro__
  3. check if it’s a subclass of a registered class (issubclass is called to do recursive check)
  4. check if it’s a subclass of a subclass (issubclass is called to do recursive check)

In this post, we only talk about the defitions of ABCMeta. We will see the typical usages in the collections module.

Python Bisect

Introduction

This module provides support for maintaining a list in sorted order without having to sort the list after each insertion. It uses a basic bisection algorithm similar with the classic binary search.

Code comments

The above binsearch function will return the first of the elements that equals to the target if there are more than one. Sometimes it’s required to find the left most one or the right most one. We can achieve this by using the bisect() functions. Let’s examine the pre and post conditions, and also loop invariants.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def bisect_right(a, x, lo=0, hi=None):

    # Let l be the original lo passed in, and h be hi. In order to differentiate.

    # @requires 0 <= l && l < h && h <= len(a)
    # @requires is_sorted(a[l:h])
    # @ensures all e in a[l:result] have e <= x, and all e in a[result:h] have e > x

    while lo < hi:
        # @loop_invariant l <= lo && lo <= hi && hi <= h
        # @loop_invariant lo == l || a[lo - 1] <= x
        # @loop_invariant hi == h || a[hi] > x
        mid = (lo + hi) // 2
        # @assert lower <= mid && mid < higher
        if x < a[mid]:
            hi = mid
        else:
            lo = mid + 1
    return lo

After exiting the loop, we have lo == hi. Now we have to distinguish some cases:

  • If lo == l, from the third conjunct, we know that a[hi] > x, and since lo == hi, we have a[lo] > x. In this case, all e in a[l:h] > x
  • If hi == h, from the second conjunct, we know that a[lo - 1] < x, in this case, all e in a[l:h] <= x
  • if lo != l and hi != h, from the second and third conjuncts, we know that a[lo - 1] <= x, a[hi] > x. The post condition still holds.

We can do the same analysis on the function bisect_left().

Note: In the pre condition I explicitly written down that lo < hi is required, the code will directly return lo when hi is less than or equal to lo, but that’s simply meaningless, if this is the case, the sorted order of ‘a’ after we insert element before the index returned by the function.

About the function prototype

In the doc of the module, it’s bisect.bisect_right(a, x, lo=0, hi=len(a)) while in the source code, it’s def bisect_left(a, x, lo=0, hi=None)

bisect.bisect_right(a, x, lo=0, hi=len(a)) is not valid python code, you will get error like this: NameError: name ‘a’ is not defined. This is because default values are computed and bound at the function definition time rather than when you call the function. This means that you can’t have a default which is dependent on something that is not known until the function is called.

Pythonic stuff

Override function definitions in Python

1
2
3
4
5
# Overwrite above definitions with a fast C implementation
try:
    from _bisect import *
except ImportError:
    pass

Gotcha – Mutable default arguments

See here for more details.

Support LaTex in Octopress

It took me some time to finally get latex math formulas working in Octopress. If you googled ‘Octopress latex’, you can get quite a few online resources about how to support latex in octopress, with various levels of complexity. In this post, I will write down how I achieve this.

The initial attempt

As I installed the jekyll-rst plugin to use rst to write my posts, I thought it should be easy to write latex math because docutils has native support for it since version 0.8 (A :math: role and also a .. math: directive introduced for that). However, after I tried to use these in a octopress post, I found that the post will be rendered to empty. For example, if I insert the following rst code into my post, the whole post becomes empty; but after removing this line, everything is fine.

1
The area of a circle is :math:`A_\text{c} = (\pi/4) d^2`.

I also verified that the exact same code can be successfully converted to valid html using the ‘rst2html.py’ script on my system, so I guess maybe something is wrong in ‘RbST’. I found that in RbST, it has its own copies of rst2html and rst2latex tools under ** /gems/RbST-0.1.3/lib/rst2parts **,

1
2
3
4
chuchao@chuchao:~/.rvm/gems/ruby-1.9.2-p320/gems/RbST-0.1.3/lib/rst2parts$ ls
__init__.py  rst2html.py  rst2latex.py  transform.py  transform.pyc
chuchao@chuchao:~/.rvm/gems/ruby-1.9.2-p320/gems/RbST-0.1.3/lib/rst2parts$ ls ..
rbst.rb  rst2parts

which will be used in rbst.rb. I have even tried to change rbst.rb to use the rst2html.py installed on my system, but this also didn’t get any luck.

1
2
3
4
5
6
7
8
9
class RbST

  @@executable_path = File.expand_path(File.join(File.dirname(__FILE__), "rst2parts"))
  @@executables = {
    # :html  => File.join(@@executable_path, "rst2html.py"),
    :html  => "/usr/local/bin/rst2html.py",
    :latex => File.join(@@executable_path, "rst2latex.py")
  }
...

Finally, I gave up on this and opened an issue on this for the jekyll-rst plugin. Hope the author can fix this.

Switch back to markdown and using kramdown

After a google search about this issue, seems that the simplest solution is to use kramdown, which has built-in support for latex.

First, install kramdown: gem install karmdown

Then, add mathjax configs into <head></head> tag, in octopress, just add the below code into /source/_includes/custom/head.html:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
<!-- mathjax config similar to math.stackexchange -->

<script type="text/x-mathjax-config">
  MathJax.Hub.Config({
    tex2jax: {
      inlineMath: [ ['$','$'], ["\\(","\\)"] ],
      processEscapes: true
    }
  });
</script>

<script type="text/x-mathjax-config">
    MathJax.Hub.Config({
      tex2jax: {
        skipTags: ['script', 'noscript', 'style', 'textarea', 'pre', 'code']
      }
    });
</script>

<script type="text/x-mathjax-config">
    MathJax.Hub.Queue(function() {
        var all = MathJax.Hub.getAllJax(), i;
        for(i=0; i < all.length; i += 1) {
            all[i].SourceElement().parentNode.className += ' has-jax';
        }
    });
</script>

<script type="text/javascript"
   src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>

See here for more details. After this, we are ready to test latex math in our post. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$$
\begin{align*}
  & \phi(x,y) = \phi \left(\sum_{i=1}^n x_ie_i, \sum_{j=1}^n y_je_j \right)
  = \sum_{i=1}^n \sum_{j=1}^n x_i y_j \phi(e_i, e_j) = \\
  & (x_1, \ldots, x_n) \left( \begin{array}{ccc}
      \phi(e_1, e_1) & \cdots & \phi(e_1, e_n) \\
      \vdots & \ddots & \vdots \\
      \phi(e_n, e_1) & \cdots & \phi(e_n, e_n)
    \end{array} \right)
  \left( \begin{array}{c}
      y_1 \\
      \vdots \\
      y_n
    \end{array} \right)
\end{align*}
$$

will render as

And for inline latex code, just use $\exp(-\frac{x^2}{2})$, which will give $\exp(-\frac{x^2}{2})$.

Python Heapq

Introduction

The module provides an implementation of heap queue algorithm, also known as priority queue algorithm.

Highlights

  • Zero-based indexing is used, so the children’s index of node with index k are (2*k + 1) and (2*k + 2) respectively.
  • Internally a ‘min heap’ is maintained rather than ‘max heap’, which is more generally used in algorithm textbooks.
  • Three general functions based on heaps are also provided:
1
2
3
4
5
6
7
8
9
10
# Merge multiple sorted inputs into a single sorted output
# (for example, merge timestamped entries from multiple log files)
heapq.merge(*iterables)

# The following two functions are effectively like
# sorted(iterable, key=key, reverse=True)[:n] and
# sorted(iterable, key=key)[:n],
# but they perform best with smaller values of n
heapq.nlargest(n, iterable[, key])
heapq.nsmallest(n, iterable[, key]))

Pythonic stuff

Rich comparison methods

1
2
3
def cmp_lt(x, y):
    # use __lt__ if available; otherwise, try __le__
    return x < y if hasattr(x, '__lt__') else (not y <= x)

In python, there are 6 so called "Rich Comparison" methods, x < y calls x.__lt__(y) and others are similar (__le__ and <=; __gt__ and >=; __eq__ and ==; __ne__ and <>). Arguments to rich comparison methods are never coerced. see coercion

Code comments

Why heapify(x) is O(n)?

This is not obvious at first by seeing the code, given that there is a ‘while’ loop in _siftup and also a while loop in _siftdown(called in _siftup). Let’s look into it further:

  1. in the while loop of _siftup, it takes O(L) time for nodes that L levels above leaves.
  2. and in the while loop of _siftdown called in _siftup, it takes at most L steps, so _siftdown is O(L).
  3. since we have n/4 nodes in level 1, n/8 nodes in level 2, and finally one root node, which is lg(n) levels above leaf, so the total amount in the while loop of heapify is:
n/4 * c + n/8 * c + n/16 * 3c + … + 1 * lg(n) * c, and let n/4 = 2^k, after simplification, we get:
c * 2^k(1/2^0 + 2/2^1 + 3/2^2 + … + (k+1)/2^k), as the limit of (k+1)/2^k is 0 when k is infinite, so the term in the brackets bound to a constant, from this we can conclude that heapify is O(2^k), which is O(n).

Why it continues to find the smaller child until a leaf is hit in _siftup?

As explained in the comment by the module author, this is a ad hoc to reduce the comparisons on the following operations on the heap.

Introduction on Reading Source of Python Standard Libraries Series

I am going to start the series of posts on reading the source of python standard modules. I will go with the pure python modules first, and maybe later I can continue with C implementations of the modules. Let’s see how far I could go.

What will be included

  • A brief introduction of the module. (It should be very short, people can go to the standard library doc for more information.)
  • Special highlights about the important APIs, implementation details.
  • Python features/idioms/tricks/gotchas that worth the whistle, especially those I was not familiar with
  • Detail explanations about the tricky part of the code if any

What will not be included

Also, alone the way, I may start another series on some specific ‘advanced topics’ in python, like descriptor, decorator, method resolution order(mro) and so on. Mainly about why they are introduced into python, how they are used and the typical use cases. This is inspired by the blogs about python history

This post will also be used to track my progress.

Hello Octopress

This is the first post, using Octopress to set up a tech blog on github is so cool!

TO DO:


  • Add a ‘about’ page serving as the self introduction
  • Enable visitors to post comments
  • Add useful plugins
  • Prepare the post to summarise the process of setting all this things up

Update on 2012-07-27:


So far, I have finished the basic settings of this blog. The changes include:

  • Change the default layout and fonts to my preference
  • Add jekyll-rst plugin, to use rst rather then markdown by default
  • Enable comment
  • Add custom asides: category_list, recent_comment, douban and tag cloud