Georeferencing and Orthophoto Mosaicking of UAV Images

Ben Wilkinson and Bon Dewitt
Geomatics Program, School of Forest Resources and Conservation

Introduction

Aerial digital video data alone can be useful when used for estimating populations of wildlife. However, because the flight of the UF UAV is generally less-stable than larger fixed-wing aircraft and because of the low altitude of the flights (around 200m), the camera frame tends to continuously jerk along the flight path. This, coupled with the disorientation caused by long term viewing of the videos, makes the analysis of the data difficult and often unpleasant. Thus, georeferencing and mosaicking of the video data can serve as an important post-processing step preceding the analysis of the collected data. (Othophoto) Mosaics, as opposed to video footage, also offer the advantage of allowing for the simple determination of area. This is useful when estimating the spatial distribution of the target subject. An ideal algorithm would use the UF UAV’s navigation data and would not rely on ground control points due to difficulties in defining ground control for dynamic rural areas such as the NBR. The method should be fast, repeatable, and adaptable to different terrain.

UAV imagery has become a popular research area for applications such as homeland security and emergency response. Because of the nature of such applications, the research emphasis has been on speed of image processing and georeferencing, and the precision of resultant images. For example, in their paper on UAV video georeferencing, Zhou, Lee, and Cheng (2005) discuss developing a real-time videogrammetric georeferencing scheme to be used for forest-fire mapping. Their methods rely on auto-identification of tie points and ground control points which can be problematic in areas with few buildings and roads (causing a lack of distinct candidate point-areas), and tall trees (causing occlusion of potential ground patterns, and higher relief-displacement between images). Also, since their method uses digital ortho-quadrangles (DOQs) for registration of the UAV imagery, natural processes and land-use change over time may hinder the identification of suitable ground control due to differences in the UAV and DOQ images. In applications where precision of the rectified image must be very high, such as when one is calculating the area of a forest fire or it’s proximity to homes, ground control can help ensure correct image positions. However, when one is using the images to find discrete quanta of specific features such as the number of animals or trees, the precision of rectified images is less important, and methods that do not employ ground control may be considered.

UAV images of Yawkee, SC UAV mosaiced images of Yawkee, SC

Overveiw of the Process

There are five main steps in the traditional process of creating a ground adjusted mosaicked/resampled image from video data. These are: (1) interpolation of navigation data and decompilation of video into individual frames; (2) finding the interior orientation parameters for the camera system; (3) finding the relative orientation parameters of the frames to each other; (4) finding the absolute orientation of the frames to a ground coordinate system; and (5) resampling the images to a ground coordinate system.

In their description of appropriate sensors for intelligent video surveillance, Kumar et al (2001) note that cameras using interlaced format introduce aliasing that leads to poor image matching and resampling, and thus progressive scan images should be used to ensure that decompiled images are continuous. Commercial software is readily available that will decompile video into discrete images. Since the UF UAV system does not have a direct time-stamping system for the image data, the images must be registered with the navigational data, specifically by capturing video of a synchronized clock directly before and after the flight. Because the frequency of navigation data measurements is less than that of the video frames, the navigation data must be interpolated over the time of the flight (cubic spline does fine). Next the images are assigned time and positional values based on the splined navigation data. As noted by Baumker and Heimes (2001) in their calibration method for direct georeferencing using INS data, for conventional photogrammetric equations to be used, INS data defined by ARINC standard (φ, θ, ψ) must be transformed into the photogrammetric orientation system (Ω, Φ, Κ). And, an arbitrary vertical ground system must be defined.

Finding image tie points is a fundamental step in the orientation process, and will be the foundation for the proposed method. Area based matching techniques may be used to automatically search for image tie-points. Specifically a combination of normalized cross-correlation and least-squares matching for refinement may be used to find tie points with high precision (Wolf and Dewitt 2000). The middle three (orientation) steps are solved in a single iterative least-squares solution called a bundle adjustment. However, since the least-squares collinearity-derived observation equations are non-linear, initial approximations of the solutions must be found in order to implement the Taylor’s series iterative solution. Thus, a priori solutions must be found for the absolute orientation of the images to ground. A solution to this prerequisite is included in the proposed method.

Since, for convenience and speed, little or no ground control points should be used in adjusting the images, GPS and INS data must be used to solve for the location of image points to ground. Although the precision of the GPS and INS data used for navigation is relatively poor, the high redundancy of image and point measurements may compensate for it in the final solution. At 30 frames per second, a flying height of 200m, flight speed of about 20m/s, images with dimensions 480x720 pixels, and an estimated focal length of about 950 pixels, we can expect decompiled images to “move” at about 3 pixel rows per frame. Conceptually, this can result in more than 140 observations per tie point if one uses a 32x32 pixel template for matching. Normalized cross correlation and least-squares matching have proved to be effective in automatically finding tie points in images, and can certainly be applied to videogrammetric data. In other words, the unique advantage of using video data is that the small change in image between frames facilitates the design of tie point searching algorithms that need not rely on measured position data, thus despite low precision positional and attitudinal data, a high count of tie-points may still be observed.

The final bundle adjustment solution will produce orientation parameters for all the photos included which may then be used to construct an orthophotos mosaic. A useful feature of the bundle adjustment solution is the determination of ground point locations for all tie-points. These 3D coordinates may then be used as the framework for creating a digital terrain model (DTM), a requirement for orthophoto, and therefore orthophoto mosaic production. (Wolf and Dewitt, 2000)

Current Challenges and Progress

References

Bäumker, M., Heimes F.J. 2001. New Calibration and Computing Method for Direct Georeferencing of Image and Scanner Data Using the Position and Angular Data of an Hybrid Inertial Navigation System. In: OEEPE Workshop, Integrated Sensor Orientation, Sept.

Kumar, R., Sawhney, H., Samarasekera, S., Hsu, S., Hai Tao, Yanlin Guo, Hanna, K. Pope, A., Wildes, R., Hirvonen, D., Hansen, M., Burt, P. 2001. Aerial Video Surveillance and Exploitation. In: Proceedings of the IEEE vol. 89, issue: 10 pg(s): 1518-1539

Wolf, Paul R., and Dewitt, Bon A. 2000. Elements of Photogrammetry with Applications in GIS, Third Edition. McGraw-Hill Science/Engineering/Math.

Zhou, G., Li, C., and Cheng, P. 2005. Unmanned Aerial Vehicle (UAV) Real-time Video Registration for Forest Fire Monitoring. In: Proceedings of the International Geoscience and Remote Sensing Symposium vol. 3, pg(s): 1803- 1806