Category Archives: Techniques

In-focus Region Detection Using Harmonic Variance of DCT Band-pass Filtering

Detecting in-focus regions in a low depth-of-field (DoF) image has important applications in scene understanding, object-based coding, image quality assessment and depth estimation because such regions may indicate semantically meaningful objects.

Here I show a challenging example,  a spider web. Note that some parts of the defocus regions still demonstrate strong patch variances.


In our paper, we show that by simply using DCT band-pass filtering and harmonic variance, we could get very robust in-focus measurement, as shown in the following picture. The idea is simple. Defocus regions only cover some low frequency bands because of lens blur, while in-focus region cover the full spectrum bands. By using DCT band-pass filtering, we could extract those frequency responses. If we just sum those responses all together, the final estimate could be well biased due to very high responses of just some low frequency channels. Harmonic mean solves this problem by penalizing those outliers and could generate very robust outputs.


In short, this algorithm works by
1) apply 63 DCT band-pass filtering (for 8×8 patches) to the input image, excluding the first DC/average filtering
2) compute patch variance for each 63 filtered images. Now at every pixel location (x, y), we have 63 variance measurements
3) compute harmonic mean of these 63 variance as harmonic variance
4) optionally, we could apply sigmoid function to the harmonic variance image to restrict the outputs to be [0, 1].

If you want to estimate the actual noise level of the input image, please refer to our paper for kurtosis analysis.

import numpy as np
import scipy.misc as misc
import scipy.ndimage as ndimage
import scipy.stats as stats
import matplotlib.pyplot as plt

def Sigmoid(x, ratio) :
    return 1 / (1 + np.exp(-ratio * x) )

def Tanh(x, ratio) :
    return 2 * Sigmoid(2 * x, ratio) - 1

def GenerateDCT8x8Base() :

    N = 8
    x = np.array( np.arange(0, N ), dtype=np.float32, ndmin=2 )

    C = np.cos( np.transpose(2*x+1) * x * np.pi / (2.0*N) ) * np.sqrt(2.0/N)
    C[:, 0] = C[:, 0] / np.sqrt(2.0)


    filters = []
    for i in range( N ) :
        for j in range( N ) :

            X = np.transpose(np.array(C[:, i], ndmin=2)) * np.transpose( C[:, j])
            im = misc.imresize(X, (128, 128), interp='nearest')
            plt.subplot(N, N, i * N + j + 1)

            filters.append( X )

    del filters[0] 

    return filters

def ComputeHarmonicVariance(im, kernels):

    size = list(im.shape)

    k_size = kernels[0].shape
    k_avg = np.ones( k_size, np.float32) / (k_size[0] * k_size[1])

    outs = np.empty( size, dtype=np.float32)
    for i in range(len(kernels)) :
        data = ndimage.convolve(im, kernels[i])
        data2 = data * data

        EX2 = ndimage.convolve(data2, k_avg)
        E2X = np.square(ndimage.convolve(data, k_avg))

        outs[:,:,i] = EX2 - E2X

    variance = np.empty( im.shape, dtype=np.float32 )
    for y in range(size[0]):
        for x in range(size[1]):
            data = outs[y, x, :]
            variance[y, x] = stats.hmean(outs[y,x,:]+1e-8)

    return variance            

im = misc.imread('Spider-Web-Macro-Photography.jpg').astype(np.float32)
im = im[:,:,0] # extract Red channel for demonstration
kernels = GenerateDCT8x8Base()
hv = ComputeHarmonicVariance(im, kernels)
plt.figure(), plt.imshow(Tanh(hv, 0.01),, plt.title('hv t 0.01')

A simple implementation of sobel filtering in Python

One can directly use ‘ndimage’ of scipy to compute the sobel filtering of the input image as follows:

dx = ndimage.sobel(im, 0)  # horizontal derivative
dy = ndimage.sobel(im, 1)  # vertical derivative
mag = np.hypot(dx, dy)  # magnitude
mag *= 255.0 / np.max(mag)  # normalize

Or your can write the function by yourself and add more features to it.

def sobel_filter(im, k_size):
    im = im.astype(np.float)
    width, height, c = im.shape
    if c > 1:
        img = 0.2126 * im[:,:,0] + 0.7152 * im[:,:,1] + 0.0722 * im[:,:,2]
        img = im
    assert(k_size == 3 or k_size == 5);
    if k_size == 3:
        kh = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]], dtype = np.float)
        kv = np.array([[1, 2, 1], [0, 0, 0], [-1, -2, -1]], dtype = np.float)
        kh = np.array([[-1, -2, 0, 2, 1], 
                   [-4, -8, 0, 8, 4], 
                   [-6, -12, 0, 12, 6],
                   [-4, -8, 0, 8, 4],
                   [-1, -2, 0, 2, 1]], dtype = np.float)
        kv = np.array([[1, 4, 6, 4, 1], 
                   [2, 8, 12, 8, 2],
                   [0, 0, 0, 0, 0], 
                   [-2, -8, -12, -8, -2],
                   [-1, -4, -6, -4, -1]], dtype = np.float)
    gx = signal.convolve2d(img, kh, mode='same', boundary = 'symm', fillvalue=0)
    gy = signal.convolve2d(img, kv, mode='same', boundary = 'symm', fillvalue=0)

    g = np.sqrt(gx * gx + gy * gy)
    g *= 255.0 / np.max(g)
    return g

Save Image into PPM ASCII Format in Matlab and Python

Here I wrote a function to save image/matrix into PPM format. I tried to simplify the saving process by using the matlab function ‘dlmwrite’ such that I only need to write the header for this PPM format. However, before streaming the image data, different channels of the input image must be interleaved. I experimented two ways to do this. Both work just fine, and the use of matlab function ‘cat’ (or ‘permute’) make the implementation very simple.

function write_ppm(im, fname, xval)

[height, width, c] = size(im);
assert(c == 3);

fid = fopen( fname, 'w' );

%% write headers
fprintf( fid, 'P3\n' );
fprintf( fid, '%d %d\n', width, height);
fprintf( fid, '%d \n', xval); %maximum values

fclose( fid ); 

%% interleave image channels before streaming
c1 = im(:, :, 1)';
c2 = im(:, :, 2)';
c3 = im(:, :, 3)';
im1 = cat(2, c1(:), c2(:), c3(:));

%% data streaming, could be slow if the image is large
dlmwrite(fname, int32(im1), '-append', 'delimiter', '\n')

The implementation in Python is even simpler. ‘f.write()’ is capable of writing the entire image in one function call with the help of ‘\n’.join().

def write_ppm(fname, im, xval):
    height, width, nc = im.shape
    assert nc == 3
    f = open(fname, 'w')
    f.write(str(width)+' '+str(height)+'\n')
    # interleave image channels before streaming    
    c1 = np.reshape(im[:, :, 0], (width*height, 1))
    c2 = np.reshape(im[:, :, 1], (width*height, 1))
    c3 = np.reshape(im[:, :, 2], (width*height, 1))
    im1 = np.hstack([c1, c2, c3])
    im2 = im1.reshape(width*height*3)



Phase difference calculation for Phase Detection Autofocus (PDAF)

Phase Detection Autofocus (PDAF) is one of the key advantages of D-SLR cameras over conventional Point-and-Shoot cameras, which usually employ contrast based autofocus system by sweeping through the focal range and stopping at the point where maximum contrast is detected. A good tutorial about how PDAF works can be found @

In the following, I show how to find the phase difference in Python when two separate distributions are obtained from two separate AF sensors corresponding to one particular AF point. The function ‘correlate’ actually does a brute-force search in the entire phase shift space and returns a phase shift value which corresponds to the maximum correlation value.

import numpy as np
from scipy import signal

def gaussian(x, mu, sig):
    return np.exp(-np.power(x - mu, 2.) / 2 * np.power(sig, 2.))

n_ele = 100
dt = np.linspace(0, 10, n_ele)
g1 = gaussian(dt, 2, 1.3)
g2 = gaussian(dt, 6, 2.2)

#Cross-correlation of two 1-dimensional sequences. This function 
# computes the correlation as generally defined in signal 
# processing texts: z[k] = sum_n a[n] * conj(v[n+k])
# with a and v sequences being zero-padded where necessary 
# and conj being the conjugate
xcorr = signal.correlate(g1, g2)

# The peak of the cross-correlation gives the shift between the 
# two signals The xcorr array goes from -nsamples to nsamples
dt2 = np.linspace(-dt[-1], dt[-1], 2 * n_ele - 1)
print 'phase shift: ', dt2[xcorr.argmax()]

Interested readers are encouraged to read more about phase difference calculation @

To accelerate the calculation, people usually transform this problem into Fourier domain to determine the phase correlation between two PDAF sensors, which is also a general technique for translation, rotation, and scale-invariant image registration. For example, given the following two images, with only translational movement between them,


one can compute the phase correlation between the two input images as:

import numpy as np
from scipy import misc
import matplotlib.pyplot as plt

def rgb2gray(rgb):
    return[...,:3], [0.299, 0.587, 0.144])

im1 = misc.imread('trans_t2.png')
im2 = misc.imread('trans_t1.png')

img1 = rgb2gray(im1).astype(np.float)
img2 = rgb2gray(im2).astype(np.float)

f1 = np.fft.fftshift(np.fft.fft2(img1))
f2 = np.fft.fftshift(np.fft.fft2(img2))

tmp1 = np.multiply(f1, np.conjugate(f2))
tmp2 = np.abs(np.multiply(f1, f2))
tmp3 = np.abs(np.fft.ifft2(tmp1 / tmp2))

translation = np.unravel_index(np.argmax(tmp3), tmp3.shape)
print translation

For more details, please refer to :
Reddy, B. S. and Chatterji, B. N., An FFT-Based Technique for Translation, Rotation, and Scale-Invariant Image Registration, IEEE Transactions on Image Processing, Vol. 5, No. 8, August 1996.

Matplotlib savefig without border/frame

For some cases, the function ‘savefig’ with ‘bbox_inches=’tight” doesn’t work well and still generates images with borders that you want to remove.

plt.savefig('fn.jpg', dpi = 300, bbox_inches='tight')

One way to avoid this is to look into the axes properties before calling the function ‘savefig’. Basically, you just need to create an axes object without border/frame and add it to the pre-generated figure window.

def save_image(data, cm, fn):
    sizes = np.shape(data)
    height = float(sizes[0])
    width = float(sizes[1])
    fig = plt.figure()
    fig.set_size_inches(width/height, 1, forward=False)
    ax = plt.Axes(fig, [0., 0., 1., 1.])

    ax.imshow(data, cmap=cm)
    plt.savefig(fn, dpi = height) 

Kile can not edit tex file

Sometimes when I use Kile to open tex files, kile just simply cannot edit them. I can move the cursor freely but cannot type in anything. In this case, you just need to uncheck Tools -> Read Only Mode.

If the read-only problem is caused by ‘The file xxx.tex was opened and contained too long lines (more than 1 024 characters). Too long lines were wrapped and the document is set to read-only mode, as saving will modify its content.’, you may walk around this by setting the line lenght to 0, under Settings -> Configure Kile -> Open/Save -> Line Lenght Limit.

The wield case I met is that my kile didn’t have menu bar, although kate has that. I searched online and found out that you can re-enable the menu bar by editing the kilerc under ~/.kde/share/config/. Changing ‘menubar=Disabled’ to ‘Enabled’ would make the menu bar re-appear.