Visual Servoing

Visual servo control, or visual servoing, makes use of computer vision to control robot motion. There exist two types of visual servo control: image-based visual servoing and position-based visual servoing. Image-based visual servo control relies on the ability of a camera to locate image points in its field of view. Position-based visual servoing uses camera vision data to perform a robot manipulator pose.

Visual Servo Control

The goal of any visual servo control system is to minimize error in image feature values. This error can be defined as:

(1)
\begin{align} \mathbf{e}(t) = \mathbf{s}(\mathbf{m}(t),\mathbf{a}) - \mathbf{s*} \end{align}

The image features s(m(t),a) are determined by the vector m(t), a set of image measurements, and a, a parameter set that contains other information about the system, such as the intrinsic parameters of the camera. The image measurements could include the coodinates of important points in the image or the coordinates of the centroids of objects in the camera view. The intrinsic parameters are constant for a given camera, and, therefore, would not change when the camera moves. s* contains the desired parameter values of the image or the desired coordinates of image points, and represents what the operator wants to see in the final image (camera's field of view).

Camera Configurations

Vision-based control systems use one of two camera positioning techniques. These are a fixed camera or a eye-in-hand configuration. For the fixed camera configuration, the camera position is fixed relative to the global reference frame or workspace. The camera is able to observe the robot arm and any objects that will be moved. One problem with this configuration is that the robot arm could block the camera's view of its end effector. Without the essential information about the end effector's position and orientation, the execution of a desired task is very difficult. For the eye-in-hand configuration, the camera is attached to the robot arm. It is usually positioned above the wrist of the manipulator, so that the end effector can be observed without any obstructions in the field of view. One possible problem with this technique is the change in the camera's field of view that occurs when the robot arm moves through the workspace.

Image-Based Visual Servoing (IBVS)

Image-based servo control seeks to determine the error in what the camera "sees" in its field of view and what the operator wants to see. For image-based visual servoing, the image feature values can be readily determined from the camera's image data. To carry out this type of control, the most common strategy is to find the desired spatial velocity of the camera, given by $\xi$, and use it as an input to a velocity controller. The spatial camera velocity can be modeled as

(2)
\begin{align} \mathbf{v}_{\rm c} = (v_{\rm c}, \mathbf{\omega}_{\rm c}) \end{align}

where vc is the linear velocity of the origin of the camera frame and $\omega$c is the angular velocity of the camera frame. Both of the velocity components are instantaneous. s(t), introduced above, is a vector of measurable feature values in an image. The derivative of s(t) is known as the image feature velocity. If the only feature is a single image point, its feature values would be given by

(3)
\begin{align} s(t) = \left[ { \begin{array}{cc} u(t)\\ v(t)\\ \end{array} } \right] \end{align}

The feature values denote the coordinates of an image point in the image plane of the camera. A linear relationship exists between the image feature velocity and the camera velocity. The camera velocity is given by

(4)
\begin{align} \xi = \left[ { \begin{array}{cc} v\\ \omega\\ \end{array} } \right] \end{align}

where $\xi$ is the camera velocity, v is the linear velocity, and $\omega$ is the angular velocity.
The image plane velocity of the image point would be $\dot s(t)$ and its relationship to the camera velocity is shown below:

(5)
\begin{align} \dot s = L(s,q)\xi \end{align}

or

(6)
\begin{align} \mathbf{\dot s} = \mathbf{L_{\rm s}}\mathbf{v}_{\rm c} \end{align}

where L(s,q) (or Ls) is the interaction matrix, and is a function of the robot configuration and the image feature values, s. The interaction matrix is also known as feature Jacobian or the image Jacobian matrix.

The above equation can be separated into spatial and angular velocity components by the following relation:

(7)
\begin{align} \dot s = L_{\rm v}(u,v,z)v+L_{\rm \omega}(u,v)\omega \end{align}

Note that Lv(u,v,z) includes the first three columns of the interaction matrix. L$\omega$(u,v) includes the last three columns of the interaction matrix, but is not depth-dependent. (Consult the following section for an overview of the interaction matrix.)

The Image Jacobian

The interaction matrix can be created for one image point, shown in Eqs. (8) and (9). It is dependent on the image coodinates of the point and focal length of the camera, $\lambda$.

(8)
\begin{align} \dot s = L_{\rm p}(u,v,z)\xi \end{align}
(9)
\begin{align} \left[ { \begin{array}{cc} \dot u\\ \dot v\\ \end{array} } \right] = \left[ { \begin{array}{cccccc} -\frac{\lambda}{z} & 0 & \frac{u}{z} & \frac{uv}{\lambda} & -\frac{\lambda^{2}+u^{2}}{\lambda} & v\\ 0 & -\frac{\lambda}{z} & \frac{v}{z} & \frac{\lambda^{2}+v^{2}}{\lambda} & -\frac{uv}{\lambda} & -u\\ \end{array} } \right] \left[ { \begin{array}{c} v_{\rm x}\\ v_{\rm y}\\ v_{\rm z}\\ \omega_{\rm x}\\ \omega_{\rm y}\\ \omega_{\rm z}\\ \end{array} } \right] \end{align}

The principle for finding the interaction matrix for one image point can be applied to multiple points. The parameter set, s, will contain the feature values of each point and a vector containing the depth values of the points, z.

(10)
\begin{align} s = \left[ { \begin{array}{cc} u_{\rm 1}\\ v_{\rm 1}\\ \vdots\\ u_{\rm n}\\ v_{\rm n}\\ \end{array} } \right] \\, z = \left[ { \begin{array}{cc} z_{\rm 1}\\ \vdots\\ z_{\rm n}\\ \end{array} } \right] \\ \end{align}

A composite interaction matrix will relate the image feature velocity $\dot s(t)$ to the camera velocity, $\xi$. It is shown below that the composite interaction matrix is a function of the image feature and depth values for the points.

(11)
\begin{align} \dot s = L_{\rm c}(s,z)\xi \end{align}

For multiple image points, it is necessary to stack the rows from the interaction matrices of the image points into one matrix. A generalized form for n interaction matrices for multiple points is given below.

(12)
\begin{align} L_{\rm c}(s,z) = \left[ { \begin{array}{cc} L_{\rm 1}(u_{\rm 1},v_{\rm 1},z_{\rm 1})\\ \vdots\\ L_{\rm n}(u_{\rm n},v_{\rm n},z_{\rm n})\\ \end{array} } \right] \\ \end{align}
(13)
\begin{align} =\left[ { \begin{array}{cccccc} -\frac{\lambda}{z_{\rm 1}} & 0 & \frac{u_{\rm 1}}{z_{\rm 1}} & \frac{u_{\rm 1}v_{\rm 1}}{\lambda} & -\frac{\lambda^{2}+u_{\rm 1}^{2}}{\lambda} & v_{\rm 1}\\ 0 & -\frac{\lambda}{z_{\rm 1}} & \frac{v_{\rm 1}}{z_{\rm 1}} & \frac{\lambda^{2}+v_{\rm 1}^{2}}{\lambda} & -\frac{u_{\rm 1}v_{\rm 1}}{\lambda} & -u_{\rm 1}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\ -\frac{\lambda}{z_{\rm n}} & 0 & \frac{u_{\rm n}}{z_{\rm n}} & \frac{u_{\rm n}v_{\rm n}}{\lambda} & -\frac{\lambda^{2}+u_{\rm n}^{2}}{\lambda} & v_{\rm n}\\ 0 & -\frac{\lambda}{z_{\rm n}} & \frac{v_{\rm n}}{z_{\rm n}} & \frac{\lambda^{2}+v_{\rm n}^{2}}{\lambda} & -\frac{u_{\rm n}v_{\rm n}}{\lambda} & -u_{\rm n}\\ \end{array} } \right] \end{align}

For a further discussion of the formation of the interaction matrix, see [2].

Partitioned Methods

The image-based servo control system encounters problems for the case in which the camera undergoes large rotations about its optical axis (the axis normal to the image plane of the camera). This situation occurs because image-based servoing does not account for camera motion. A solution to this problem employs a partitioned method, which controls a portion of the degrees of freedom using the interaction matrix. The remaining degrees of freedom are controlled using other methods. From Eq. (9), the following result can be obtained:

(14)
\begin{align} \dot s=L_{\rm xy}\xi_{\rm xy}+L_{\rm z}\xi_{\rm z} \end{align}

$\dot s$xy = Lxy$\xi$xy gives the image plane velocity component for the translation along and rotation about the x and y axes. $\dot s$z = Lz$\xi$z gives the image plane velocity component due to camera translation along and rotation about the z (optical) axis. (See [2] for a further discussion of the partitioned method.)

Position-Based Visual Servoing (PBVS)

Position-based servo control uses a set of parameters s. This set has to be found from image measurements, in which 3-D parameters are found. With this type of control, it is desired to create a 3-D representation of the workspace and target points (relative to the camera frame) in real time. Errors in camera calibration can hamper the effectiveness of this method. Because the camera motion can result in an object leaving the camera's field of view, there is no direct control over the image.

Applications

Example: Robot Arm Manipulator with Image-Based Servo Control

A robot arm manipulator was created that employed image-based visual servoing using a eye-in-hand camera at the end effector. This example uses a gradient search to move in the direction of minumum error until the target point is reached. The goal of the task was to move the arm of a robot manipulator so that its end effector would reach and press a button on an elevator panel. The image points consisted of the centroids of two buttons on the panel.

Figure1. This is a 6 degree of freedom robot arm with 3 prismatic and 3 rotation joints. The camera is shown from the orange link 6 toward the end effector.
Figure2. This is the camera view. Many tools in Matlab are used to set its position, target, and up vector for the camera.

This example is based in visual servoing with a gradient search algorithm instead of using the image Jacobian. The first step is to create the 6 images of the different joint movements in the positive and negative step directions. Then with each case, calculate and save the error. The goal was to line up the center of the camera with the center of the button, so the camera xy error is just the distance equation. The depth error is harder to get. By placing two buttons and using the perspective camera view. The distance between the two buttons could be compared to its true distance. This is how the depth error was found.

Figure3. After the image processing, the distance from the center of the screen can be compared to the target button, and the distance between the buttons can be saved for the depth calculation.

%% Here are some settings for the Matlab view
camproj('perspective')
axis vis3d off
v=[T06(1,4),T06(2,4),T06(3,4)+50];
campos(v) % this is the position of the camera
camtarget([T06(1,4),T06(2,4),T06(3,4)]) % this is the direction the camera is facing
camup([T06(1,1),T06(2,1),T06(3,1)]) % this sets the orientation of the camera frame
drawnow % this refreshes the figure
pos1=[20,35,500,500]; % [left bottom width height]
set(12,'Position',pos1) % where and what size the figure will be
rect=pos1;
M=getframe(12,rect); % where and what to capture

Next is finding the lowest error among the steps and taking the corresponding step to which movement it came from. The gradient search method leads the robot on the path of least error to the target. In figure 4, the robot reached the tolerance level in the error of the button around 15 iterations and stopped taking steps in the x and y directions. As the robot continues to move in the z direction the little error was compounded with the zooming perspective view. In figure 5 the robot continues to move in the depth direction until it reaches the set tolerance.

%% sample code for image processing and error calculation
I = M.cdata; % load image
% Color of object (RGB value)
obj_color=[0,0,0]; % filter black
%Subtract object color from image
I(:,:,1) = I(:,:,1) - obj_color(1);
I(:,:,2) = I(:,:,2) - obj_color(2);
I(:,:,3) = I(:,:,3) - obj_color(3);
%Convert to black and white image
I=im2bw(I,.1);
% Filter out any objects that are smaller than certain size
I = bwareaopen(~I, 50);
I = bwlabel(I);
% find centroids in image data
s = regionprops(I, 'Centroid');
% find center of camera view
b=size(M.cdata); % order is y then x
l=b/2;
centroids = cat(1, s.Centroid);
% calculate error
camerr=((l(2)-centroids(1,1))^2 + (l(1)-centroids(1,2))^2)^.5;

%% sample code for gradient search
for b=1:6 % load different theta steps to find error
if flag1 == 1
if b==1
d3 = d3r+dstep;
elseif b==2
d3 = d3r-dstep;
elseif b==3
d2 = d2r+dstep;
elseif b==4
d2 = d2r-dstep;
end
end
if flag2 == 1
elseif b==5
d1 = d1r+dstep;
elseif b==6
d1 = d1r-dstep;
end

% now plot theta step to image M
[M] = rob6rot(ptb,d1,d2,d3,l1,l1b,l2,l3,l4,l5,l6,t1,t2,t3, …
l1r,l2r,l3r,l4r,l5r,l6r,t1r,t2r,t3r);
% find camera error based on image M
[camerr,cam2err]=rob6img(M);
% save camera error
if b < 5
Et(j,b) = camerr; % xy error
elseif b==5
Ex(j,1) = cam2err; % depth error
elseif b==6
Ex(j,2) = cam2err; % depth error
end
%

Figure4. Using the gradient search method to drive the robot arm, the code compares the 6 different error values from the movements to pick the path of least error.

% sample code for choosing least error path
% now move in the dir of least Et
Emin(j)= min(Et(j,:)); % check for xy error in 1 to 4 moves
Extar(j)= min(Ex(j,:)); % check for depth error in 5 and 6 moves
if j < jlimit % if too many iterations
% move in the x,y
if flag1 == 1
if Emin(j) >= tol
if Et(j,1) == Emin(j)
d3 = d3r+dstep;
elseif Et(j,2) == Emin(j)
d3 = d3r-dstep;
elseif Et(j,3) == Emin(j)
d2 = d2r+dstep;
elseif Et(j,4) == Emin(j)
d2 = d2r-dstep;
end
else
flag1 = 0; % within tol, end the while loop
fprintf('after %d iterations, INside of xy tol %d \n',j,tol)
end
end
%

Figure5. The depth error is driven to the tolerance level.

References

Bibliography
1. Chaumette, Francois and Seth Hutchinson. IEEE Robotics and Automation Magazine. Title: Visual Servo Control Part I: Basic Approaches. December 2006.
2. Spong, et.al. Robot Modeling and Control.