Monthly Archives: March 2007

Everything You Need To Know To Start Programming 64-Bit Windows Systems

Matt Pietrek

One of the pleasures of working on the bleeding edge of Windows® is poking around in a new technology to see how it works. I don’t really feel comfortable with an operating system until I have a little under-the-hood knowledge. So when the 64-bit editions of Windows XP and Windows Server™ 2003 appeared on the scene, I was all over them.

The nice thing about Win64 and the x64 CPU architecture is that they’re different enough from their predecessors to be interesting, while not requiring a huge learning curve. While we developers would like to think that moving to x64 is just a recompile away, the reality is that we’ll still spend far too much time in the debugger. A good working knowledge of the OS and CPU is invaluable.

In this article I’ll boil down my experiences with Win64 and the x64 architecture to the essentials that a hotshot Win32® programmer needs for the move to x64. I’ll assume that you know basic Win32 concepts, basic x86 concepts, and why your code should run on Win64. This frees me to focus on the good stuff. Think of this overview as a look at just the important differences relative to your knowledge of Win32 and the x86 architecture.

One nice thing about x64 systems is that you can use either Win32 or Win64 on the same machine without serious performance losses, unlike Itanium-based systems. And despite a few obscure differences between the Intel and AMD x64 implementations, the same x64-compatible build of Windows should run on either. You don’t need one version of Windows for AMD x64 systems and another for Intel x64 systems.

I’ve divided the discussion into three broad areas: OS implementation details, just enough x64 CPU architecture to get by, and developing for x64 with Visual C++®.

http://msdn.microsoft.com/msdnmag/issues/06/05/x64/default.aspx

Vision的第一次HW

第一次作业做两幅图像的fusion,用Laplacian Pyramid 的方法。自己写卷积函数,上采样下采样函数,还有Pyramid函数。搞不懂vison的老师为什么还要我们实现这么老的算法,上学期都已经学过Poisson Image Editing了,这学期反而退化去实现这个Pyramid算法,简直没有脾气。

输入两幅图像,还有mask图像:

 融合结果为下左,接着相加的结果为下右:

被程序给郁闷了

郁闷,不能release image

Unhandled exception at 0x7c901230 in Prog1.exe: User breakpoint

程序停在free.c的return HeapAlloc(_crtheap, 0, size);这一行

如果ignore,and continue,则会碰到

Insufficient memory (Out of memory) in function cvAlloc, .\cxalloc.cpp(111)

….

搞了快6个小时才发现,image循环的for loop写错了,写成了

for(int i = 0; i < src->widthStep; i ++)

这样image的数据就写出届了,搞得image的data crash了,因而不能release。还好现在都搞定了,只用了0.2s,删掉了4个字母。 

准备学习Emacs

以前一直都是用vim的,而且觉得vim的功能很强大,用起来也很方便,所以很排斥emacs。最近学习lisp,发现用vim写lisp程序还是不太方便。用vim写好的lisp程序还要另外打开lisp的编译器才能编译,有点不爽,决定看看emacs。不过就准备学习一下emacs的简单功能,学多了没时间也容易忘,呵呵。

A Tutorial Introduction to GNU Emacs
http://www2.lib.uchicago.edu/~keith//tcl-course/emacs-tutorial.html
A Emacs+Lisp Tutorial
http://xahlee.org/emacs/emacs.html

Single Camera Calibration

最近做了single camera calibration,程序很快就搞定了,但是结果验证却花了近一个星期的时间,各个坐标系的转化搞得我晕头晕脑的,正确的结果还以为是错的。

calibration直接调用opencv里的函数,实现的是zhang’s A flexible new technique for camera calibration的算法,矫正的参数用pov-ray生成的image验证了一把,还是很准的。

我在pov-ray里定义的camera为:
location <5, -8, -16>
look_at 0
angle 60
resolution 800×600
right 1.33
up 1

所以intrinsic parameters 应该为:

[fx 0 cx]     [692.8203 0 400]
[0 fy cy] =  [0 692.8203 300] (fx = fy = 600 * (2 / 3) * 3 ^ 0.5 = 692.8203.)
[0  0   1]     [0 0        1        ]

calibration的结果为:

693.979 0 399.431
0 692.647 300.201
0 0 1

extrinsic parameters应该为:

calibration的结果为:

-0.954613 1.60057e-005 -0.297849
-0.12835 -0.902411 0.411317
-0.268776 0.430877 0.861455

T’ = 0.0100281 -0.000391165 18.578

从上面的结果来看zhang’s的算法还是很准的,难怪opencv都用他的算法。只是extrinsic parameters产生的结果让我迷了很长时间,我一直以为translation matrix T应该就是camera location <5, -8, -16>,但是结果是 <0.0100281 -0.000391165 18.578>。实在想不明白,最后还是到yahoo opencv论坛上问了一把,才知道这个R,T都是在定义在camera coordinate下面的,所以最终的T是用R*t’得到的,因为要将该向量转化到camera coordinate下面。因为这郁闷了很长一段时间,现在总算搞明白了。

Win32程序函数调用时堆栈变化情况分析

http://ubee.cn/cxkf/Windows/200611/310.shtml

在经典的汇编语言教程中,函数调用时堆栈的使用都是着重讲解的问题。如今随着高级语言的越来越完善,单纯使用汇编开发的程序已经不多了。但对函数调用时堆栈动向的了解仍有助于我们明晰程序的执行流程,从而在程序编写和调试的过程中有一个清晰的思路。
一.调用约定
在Win32中,有关函数的调用主要有两种约定。
1._stdcall
        以__stdcall方式调用的函数有以下特征:
    •  参数由右至左压栈
    • 调用返回时,堆栈由被调函数调整
2.__cdecl
__cdecl约定是C/C++函数的默认调用约定。它有以下特征:
    • 参数由右至左压栈
    • 调用返回时,堆栈由调用者调整
二.Win32函数调用过程
1.    压入参数
这里依据以上的调用方式将调用者给出的参数一一压入堆栈。
2.    压入断点
当程序执行到Call指令的时候,当前语句的地址作为断点地址压入堆栈。
3.    跳转
eip的值被重新设置为被调函数的起始地址。
4.    mov ebp, esp
这里ebp被用来在堆栈中寻找调用者压入的参数,同时作为调用者堆栈指针的一个备份。在此前还应该执行一条:
push ebp
把ebp中原来的数值保存。
5.    sub esp,N
这里N是函数内局部变量的总字节数加上一个整数,一般为40。此后esp即为被调函数的堆栈指针了。
6.    初始化esp ~ esp-N之间的N字节空间
这是对堆栈中已分配给局部变量使用的内存空间的初始化,一般全部设置为0xcc。
7.    顺序执行函数内语句。
此时函数的堆栈位于所有局部变量的内存空间之后,二者之间一般有40字节的隔离带。
8.返回
为保障调用的正常返回,函数内应当保证规范使用堆栈,使即将返回的时候esp的值恢复为执行第一条语句前的状态。说明白点,就是每一条push都要有相应的pop。
调用返回的过程如下:
mov esp, ebp
执行后,esp恢复为调用者的堆栈指针,栈顶除断点地址外,还存有原ebp的值和调用时压入的参数。
然后依次弹出ebp的值和断点地址。如果是__cdecl约定则直接返回调用者,调用者将负责调整堆栈,丢弃调先前压入的参数。如果是__stdcall则这个工作由被调函数来执行。
程序样例如下:
……
0040B8E8   push        1            ;压入参数
0040B8EA   call        00401028        ;调用函数
……
00401028   jmp         0040b7c0        ;跳转到函数入口
……
0040B7C0   push        ebp            ;保存ebp
0040B7C1   mov         ebp,esp        
0040B7C3   sub         esp,44h        ;设置函数的堆栈指针,此函数中有4
;字节的局部变量
0040B7C6   push        ebx
0040B7C7   push        esi        
0040B7C8   push        edi
0040B7C9   lea         edi,[ebp-44h]    
0040B7CC   mov         ecx,11h
0040B7D1   mov         eax,0CCCCCCCCh
0040B7D6   rep stos    dword ptr [edi]    ;初始化局部变量空间
0040B7D8   mov         eax,dword ptr [ebp+8]
0040B7DB   mov         dword ptr [ebp-4],eax
……
0040B7DE   pop         edi            ;弹出曾压栈的数据
0040B7DF   pop         esi
0040B7E0   pop         ebx
0040B7E1   mov         esp,ebp        ;恢复调用者的堆栈
0040B7E3   pop         ebp            ;弹出原ebp值
0040B7E4   ret         4            ;返回并将堆栈向上调整4字节。
;此处为__stdcall约定,所以由函数调
;整堆栈
相应的C代码如下:
void __stdcall fun(int);
int main(void)
{
    ……    
fun(1);
……
    return 0;
}
void __stdcall fun(int para)
{
    int localpara = para;
    ……
}

Tail Recursions

http://www.cs.sfu.ca/CC/310/pwfong/Lisp/2/tutorial2.html

Recursive functions are usually easier to reason about. Notice how we articulate the correctness of recursive functions in this and the previous tutorial. However, some naive programmers complain that recursive functions are slow when compared to their iterative counter parts. For example, consider the original implementation of factorial we saw in the previous tutorial:

(defun factorial (N)
  "Compute the factorial of N."
  (if (= N 1)
      1
    (* N (factorial (- N 1)))))

It is fair to point out that, as recursion unfolds, stack frames will have to be set up, function arguments will have to be pushed into the stack, so on and so forth, resulting in unnecessary runtime overhead not experienced by the iterative counterpart of the above factorial function:

int factorial(int N) {
  int A = 1;
  while (N != 1) {
    A = A * N;
    N = N - 1;
  }
  return A;
}

Because of this and other excuses, programmers conclude that they could write off recursive implementations …

Modern compilers for functional programming languages usually implement tail-recursive call optimizations which automatically translate a certain kind of linear recursion into efficient iterations. A linear recursive function is tail-recursive if the result of each recursive call is returned right away as the value of the function. Let’s examine the implementation of fast-factorial again:

(defun fast-factorial (N)
  "A tail-recursive version of factorial."
  (fast-factorial-aux N 1))

(defun fast-factorial-aux (N A)
  "Multiply A by the factorial of N."
  (if (= N 1)
      A
    (fast-factorial-aux (- N 1) (* N A))))

Notice that, in fast-factorial-aux, there is no work left to be done after the recursive call (fast-factorial-aux (- N 1) (* N A)). Consequently, the compiler will not create new stack frame or push arguments, but instead simply bind (- N 1) to N and (* N A) to A, and jump to the beginning of the function. Such optimization effectively renders fast-factorial as efficient as its iterative counterpart. Notice also the striking structural similarity between the two.

When you implement linearly recursive functions, you are encouraged to restructure it as a tail recursion after you have fully debugged your implementation. Doing so allows the compiler to optimize away stack management code. However, you should do so only after you get the prototype function correctly implemented. Notice that the technique of accumulator variables can be used even when we are not transforming code to tail recursions. For some problems, the use of accumulator variables offers the most natural solutions.