Cartier Looking Glass

Into The Looking Glass: using science fiction to solve a difficult problem.

Augmented Reality allows us to merge the digital and physical. But AR is hard. Getting AR to Cartier standards, as we discovered after months and months of research, was near impossible.

At the Brooklyn Retail Lab, one of our central questions was: how do we evoke products that are not present? From that question came Looking GlassOne, an AR try-on for rings. The goal was to show clients rings on their hands that were not currently in stock in the store.

From the outside, AR technology can seem pretty good, even usable. But when we delved deeper, it became clear that many technical problems had not yet been solved. Rendering realistic diamonds in real time is extremely hard. Lighting, drift, occlusion and many other issues leave the experience with an uncanny valley feeling: something is off, and not real.

Everyone was approaching the problem in the same way: use a camera to determine where an object should be placed, render a 3D model into the scene, then use lighting and shadows to trick the mind into believing it is real. After months of research, we found no AR technology or vendor that could meet the standard we wanted for Cartier clients. AR felt stuck.

So we asked a different question. What if we used Machine Learning to turn Augmented Reality on its head? Not just to better guess the position of a hand, but to actually generate the ring as well. This had never been done before. We were not sure it was possible. But if it worked, it would be a giant leap forward, not just for Cartier, but for the entire AR industry.

The project drew on two major machine learning breakthroughs: GANs and CycleGANs. GANs generate synthetic images that can pass for real data. CycleGANs build on that idea, translating one image into another: for example, turning a horse into a zebra. We wanted to explore whether a machine could translate a hand wearing a marker ring into a hand wearing a Cartier engagement ring.

The Brooklyn Lab partnered with Jolibrain in France, an A-team of machine learning specialists whose clients included NASA, Airbus and The Tate. Jolibrain only takes on machine learning problems that have not been solved, and this certainly fit the bill.

With the help of Cartier sales associates, we spent a week at the Mansion capturing thousands of HD videos and images of hands, with and without Cartier engagement rings. We also captured people wearing a simple generic black marker ring that we had 3D printed. The marker ring gave the client the feel of a real ring while creating a visual anchor for the machine learning process.

With the data collected, we spent months testing the hypothesis: first generating new hands with diamond rings from footage of the marker ring. The early results were far from perfect, but they showed the approach was sound. We were not yet judging image quality. We were asking whether CycleGAN could work in this use case.

Because the desired output was real-time video, rendering the whole hand would be extremely compute-intensive. The next experimental task was to develop a patch around the captured rings and fuse that patch back onto hands containing the marker rings, testing different modifications to balance placement, rendering and fusion with the rest of the hand.

The patch approach helped focus the generator on ring detail and let us make the dataset appear larger to the model by creating offset crops from a single image. It was a lengthy process. The results were encouraging, but we encountered new problems, including skin inversion and inconsistent skin tone. Accurate ring placement was possible, but context and rendering quality made it clear that the dataset was still too small.

Then Covid-19 hit, first with shutdowns in France and then in the US. While the team continued running improvements on eight dedicated GPUs around the clock, we needed a way to improve the dataset. My wife owned a Cartier Destinee engagement ring, so I enlisted her help and set up a makeshift rig in my study to mimic our initial shoot. Over three weeks I captured thousands more HD videos and photos, including close-ups of the diamond so we could test rendering quality.

One of our major concerns was whether we could generate good-looking diamonds without the muddy pixelated artifacts often seen in machine learning. The team began a new phase of close-up ring construction, another long and compute-heavy training process. The results were strong, but also revealed a new issue we called melted-ring syndrome.

The melting was partly because a perfect circle is difficult for a generator to achieve in pixel space. The inner diamond texture melting was caused by dropout at training time, which led to unwanted averaging at inference time. The team worked through this and improved diamond rendering quality significantly.

One unexpected result appeared during the CycleGAN loop that created a fake ring image and then attempted to recreate the original. It left a ghost imprint of the ring on the finger, a strange artifact from the model's internal understanding of the transformation.

After months of work, the results clearly demonstrated that the approach was workable: this new way of thinking about AR was not merely science fiction. Instead of placing a rendered 3D object into a camera feed, the system explored generating the jewelry directly into the image itself.

Role: Senior Creative Technologist. Context: Cartier Brooklyn Innovation Lab, New York. Machine learning partner: Jolibrain.