Full Outer Join: September 2009

One of the first computer recreations I remember trying was using the Monte Carlo method to calculate the area of a circle. The basic technique is to generate a random point in a square and keep track of the proportion that fall within the inscribed circle (that is

).

Back in the day, this was probably a dozen lines of Fortran (or maybe Basic), but it's an even shorter program in Python:


from __future__ import division
from numpy.random import uniform as u

def mcCircleArea(r=1,n=10000):
return sum( 1 if u(-r,r)**2 + u(-r,r)**2 <=r else 0 for i in range(n) )/n*((2*r)**2)

This example is often used to explain the concept of Monte Carlo integration (in fact, it's used on the Wikipedia page for that topic). But it's kind of an odd case in the sense that you're working with a function that's like:


f(x,y) = \begin{cases} 1, & \mbox{if } x^2+y^2<r  \\ 0, & \mbox{otherwise} \end{cases}

The basic idea behind Monte Carlo integration is that to find the integral of a function f(x) you take many random samples x from a domain D and then you multiply the average of f(xi) by the volume of the domain. So imagine that you wanted to find the area of a circle as you might have in basic calculus, starting with the equation:


f(x)=\sqrt{r-x^{2}}

In this case you could take uniformly random samples from the interval (0,r], calculate f(x) for those samples, and then multiply that average by r to get an estimate of the area of one quadrant of a circle. I think a similar process works for most functions of a single variable because it's easy to take samples from a uniform distribution (which is not too say that it's always a good way to do integration).

Monte Carlo integration problems come up in many applications where it's not easy to sample points from the domain of interest, because they are high dimension and/or oddly shaped. In that case, you can do a nifty trick where you choose a simpler domain that you can sample, and provided that it is a superset of the target distribution it can still be applied to the problem. Basically you do what happens in the circle area code above: generate random points in the larger domain and keep track of the proportion that fall within the domain of interest. This is called rejection sampling.

Again, I find the circle example unsatisfying because it's basically turning a function of one variable into a constant function of two variables. On the other hand, its hard to contrive an example where the domain is a single dimension. You could do something like integrate the function above from, say, (0.2,0.8) while sampling points from the superset (0,1.0).


from __future__ import division
from numpy.random import uniform as u
import numpy as np

def mcSubInt(r=1,n=10000,a=0.2,b=0.8):
    ''' Definite integral from 0.2 to 0.8 is 0.50498905889431 '''
    return sum([np.sqrt(r-x**2) for x in [u(0,r) for i in range(n)] if x > a and x < b])/n*r

This calculates


\frac{1}{n}\sum_{1}^{n}f(x_{i})

where


f(x) = \begin{cases} \sqrt{r-x^{2}}, & \mbox{if } a<&x<b  \\ 0, & \mbox{otherwise} \end{cases}

Kind of a boring example, but i think it qualifies. A slightly more interesting example might be integrating the equation f(x,y) = y+1 over a unit circle domain centered at the origin of the x,y plane. That works out to the volume of half of a 2 unit-high unit cylinder, or pi (ok, so maybe that's still not that interesting).


from __future__ import division
from numpy.random import uniform as u
from pylab import scatter,show
import numpy as np

def mcCircleDomain(n=10000):
  sum_fx = 0
  accept = []
  reject = []
  for i in range(n):
      (x,y) = u(-1,1,2)
      if (x**2 + y**2) < 1:
          sum_fx += (y+2)
          accept.append((x,y))
      else:
          reject.append((x,y))
        
  scatter([c[0] for c in accept],[c[1] for c in accept])
  scatter([c[0] for c in reject],[c[1] for c in reject],c='r')
  show()
    
  return sum_fx/n*4

Full Outer Join

Wednesday, September 2, 2009

Monte Carlo Integration

Followers

Blog Archive

About Me