Web-AR: Technical challenges and a glimpse into the near future

The topics of virtual, mixed and augmented reality have gained a lot of popularity in recent months. In particular, the “Pokémon Go” app has contributed to Augmented Reality (AR) becoming known to the general public.
·
21.09.2017
Ein Teammitglied von Cap3 präsentiert etwas auf einem Smartphone.

A good example of the successful use of AR is the “1600” app from the White House Historical Association, which brings a $1 note to life.

In order to “augment” print media in particular with this technology, it would be a great advantage if no native app (Android/iOS) was required. In many cases, the target group will shy away from the hassle of first installing an app that is otherwise not required. The use of augmented reality in a marketing or promotional context is therefore probably not yet developing its full potential. In order for the media break to succeed, it would be good to access the AR content via the browser — and ideally without manually entering the URL. I would like to take a look at the state of development of “AR in the browser” below.

The challenge of positioning

The basic problem of AR can be reduced to a single factor: position. Namely the position (and orientation) of the camera in relation to the environment. In order to correctly place an object over the (live) camera image so that it looks as if it is actually there, the AR software requires information about the size and rotation of the object to be drawn. In most cases, this is achieved using so-called markers.

A marker can be a pattern, an image, or an object. The only important thing is that it contains enough unique features. The more complex the marker, the higher the computing power required to recognize it in the video stream. A browser application usually has less processing power available than a native application. That is the reason why AR has so far been implemented almost exclusively natively.

2D barcode or matrix markers

The simplest type of marker is the group of so-called 2D barcode markers. Thanks to their fixed appearance, they can be recognized very quickly and with little computational effort. What all markers have in common is the wide frame that is necessary for recognition. The number of possible markers depends on the dimensions of the matrix and the error detection used. As a rule, 3x3, 4x4, 5x5 and 6x6 matrices are used. A 3x3 matrix without error detection, for example, offers the option of defining 64 different markers that are recognizable regardless of their rotation. The same matrix with Hamming bug fix offers only 8 different markers. On the other hand, the probability of incorrect detection is lower. In order to further increase stability, it is possible to use combined markers. In this case, the spatial relationship of the markers to one another is also used. These markers are ideal for use in the browser.

Vergleich der verschiedenen Auflösungen der Marker

Template Square Marker

Template square markers, or even just “square markers”, are structured in a similar way to 2D barcode markers. They too need the frame around the marker-specific area. The inner part of the marker can be designed in almost any way, colors are also possible. It is important here that the marker does not rotationally symmetrical is. These markers are often saved as a matrix with 8x8 or 16x16 elements. The smaller the matrix, the easier it is to recognize. Positioning with these markers works relatively quickly and robustly in browsers (including mobile devices).

Beispielhafte Darstellung zweier Square Marker

Natural Feature Tracking

In many cases, the bar codes or square markers are not sufficient to achieve the desired effect. Especially in the above-mentioned marketing or promotional context, markers should be less noticeable and visually embedded in the environment to be augmented.

Natural Feature Tracking (NFT) is suitable for this area of application. This saves distinctive points of an image as features. These features can then be identified in the video stream.

Ein Bild, dass Segelschiffe am Horizont zeigt.
Dasselbe Bild, es wurden generierte Marker überlagert

Unfortunately, this process has not yet arrived in the browser. There are already experimental implementations, but none of them have yet made it into any of the available AR frameworks. Rumor has it The first implementations are planned for this year.

Markerless tracking

Native AR frameworks are already ahead of this. In particular, the recently released ARKit from Apple and Google's competitor ARCore impressively show what is possible on mobile devices.

With ARKit and ARCore, it is possible to place objects in the environment without markers. This is possible because the software knows its position relative to and the nature of the environment at all times. This process is called Simultaneous Localization and Mapping (SLAM), in English Simultaneous Localization and Map Creation. This involves scanning the environment in real time and creating a model, which is updated and adjusted with every movement. In addition, every time the model is updated, the likely position of the smartphone within this model is calculated.

SLAM has been used in robotics for years. Here, several sensors are usually used to record the environment (e.g. lasers, cameras, sonar, IMU). The special thing about the Apple and Google implementations is the fact that only a single camera (monocular SLAM) and supporting the IMU (Inertial Measurement Unit or Inertial Measurement Unit) of the smartphone is used.

The present and future of WebAR

The presence of WebAR is significantly losing compared to current native implementations. Nevertheless, simple things can already be implemented on the web today.

There are basically two approaches to advancing the web as an AR platform. Either the required algorithms are implemented in JavaScript, or the browsers offer a corresponding API. The former has the disadvantage that the performance of these algorithms usually lags far behind native implementations. As mentioned above, this is one of the reasons why development on this track is progressing very slowly. The second approach doesn't have this disadvantage because the algorithms behind the browser API would be implemented natively. that ARCore project already offers experimental browsers for download, which can be used to test these functions. However, it will probably be a long time before this API is specified as a standard and implemented in all common browsers.

projects such as AR.js, Chromium weave And also ARCore clearly show that the (AR) web is in a spirit of optimism. With the first working NFT implementations at the latest, the browser will become a serious AR medium.

By the way, we are currently working on a WebAR implementation. I hope to be able to report on this soon as well.

Your Ideas are in
good hands.