Adapting the translation gesture to a 3D environment

In a mobile application we are developing, we display a 3D world through which users need to navigate quickly in order to perform tasks at different geographic locations. Our first camera implementation for this app was simple: translate one-finger drags and two-finger rotations into camera translations and camera rotations, in proportion to the amount of finger movement. This was a natural implementation strategy, but it didn't feel as intuitive as the equivalent gestures in a 2D environment. We quickly realized that the difference between our 2D and 3D gestures was that in 2D, users can put their fingers on the 2D scene itself, which reacts like a piece of paper would: by staying attached to the fingers as they move around. In 2D, this is relatively easy to achieve: moving or rotating the scene N units in one direction is equivalent to moving or rotating the camera N units in the opposite direction. In 3D, however, the relationship is no longer linear: if the camera moves a few units to the left, foreground items might move completely off-screen, while elements at the horizon might barely move. It's harder than in 2D, but it can be done. This series of articles will explain the algorithms we used to achieve intuitive camera transformations, similar to the ones in Google Earth. Today's topic: one-finger translation.

Camera truck/boom gesture, aka 2D panning

Our goal is to allow the user to put their finger on a landmark, to drag the finger along the screen, and to have the camera move in a way which keeps the landmark under the finger. One of the difficulties here is that the problem is severely under-constrained: the camera could actually be anywhere in the 3D world (except on the landmark itself), and it would still be possible to orient the camera in a way which placed the landmark at the correct position on the screen. Even if we constrain the camera orientation to stay unchanged during translations, we still have one unconstrained variable left: the distance between the camera and the landmark. Keeping the distance constant would be one solution. Imagine a plane, parallel to the screen and passing through the camera. When you touch a landmark, there is a ray connecting the camera to the landmark, and this ray intersects the plane at the camera position. Move your finger across the screen, and there should be a ray between the as-yet-unknown new camera position and the landmark. By throwing a ray from the camera in the direction of the new finger position, we can compute the orientation of the ray, but not the new camera position from which the ray should start. No matter, we know where the ray ends: start from the landmark, and throw a ray in the opposite direction. It intersects the plane at exactly one point, and this is where the new camera should be.

Camera hover gesture

This strategy works, but results in a 2D-style pan, which is not a common operation in our app. Instead, we want dragging to be used for navigation: we want to pull the camera forward by dragging landmarks toward the screen. Fortunately, the plane algorithm can be adapted to our purposes, simply by choosing a different plane. If we wanted the camera to always stay at the same distance from the ground, for example, we could pick a plane parallel to the ground and passing through the camera at the desired height. This strategy also works, but this time there is an edge case to worry about: if we drag the landmark exactly on the horizon line, we would have to move the camera infinitely far in order for the landmark to end up at this location on the screen. This corresponds to a ray which is parallel to the plane. And as the finger approaches the horizon, the quasi-parallel ray might cause the camera to move much farther away than the user intended. We solve both problems by ignoring rays which don't intersect the plane within a specified distance. Our distance metric needs to take the direction of the ray into account: if the ray is pointing away from the plane, it means the user is trying to move a landmark beyond the horizon, which isn't allowed.

Camera dolly gesture

In our app, the height of the terrain is often uneven, with mountains and valleys rising up and down significantly. For this reason, keeping the camera at a constant height wasn't a solution for us. One of our first ideas was to keep the camera at a constant vertical distance from the varying ground, but this was incompatible with the other constraint about keeping the camera orientation fixed. Imagine a camera standing on a rocky mountain, looking towards the horizon. As the camera moves to the left, it will have to bump up and down in order to follow the hard edges of the rocks below. Yet if we take a landmark at the horizon and drag it towards the right, causing the camera to move left, the camera's bumping up and down will cause the landmark to move up and down as well, detaching it from the user's finger. We don't want this. The solution, once again, is to adapt the previous solution by choosing a different plane. This time, the choice of plane will depend on the position of the camera and the position of the landmark. We need three points to define a plane: the first will be the camera position, and in order to keep the camera at about the same distance from the ground, we place our second point above the landmark, at the same distance as the camera's distance to the ground. Imagine the different planes which go through those two points: there is a vertical one, and all of its rotations around the ray. Since the camera will move along that plane, the vertical plane is really bad: it would drive the camera through the ground. In order to pick the most horizontal plane, we place our third point at one unit to the right of the camera.


Adapting the one-finger translation gesture to 3D isn't too hard, as long as you constrain the camera's position to a plane. Moreover, since the algorithm is the same regardless of the location of that plane, you can easily pick one which suits your needs simply by thinking about the plane along which you would like to constrain the camera. Our next article in the series will cover a slightly more difficult problem: rotating the camera around a point. This time, instead of keeping landmarks under one or more fingers, the difficulty will be to keep the point around which we rotate at the same place on the screen.