Ori Feldstein | 24 Sep 2024

Why we need to change the way we think about visuals, images and video

This article was originally published on Medium

The old cliche says that a photograph is a moment frozen in time. There’s something comforting about that; the idea that you can capture an event, an emotion, a memory — and hold onto it forever.

But this cliche no longer represents reality. Technology moves fast and principles that were true for 150 years can suddenly be turned on their head. While we want to hang on to accurate representations of our special moments, there are times when we may want to change a photograph to better tell a story.

Today, advanced AI gives us the power to modify any image or video according to our creative needs. We can even generate the entire image or video from scratch, with no camera needed, and get a super realistic result.

a screenshot from whichfaceisreal.com

whichfaceisreal.com challenges us to guess which of two photos is a synthetic person generated by AI.

How? With generative AI. This new evolution of AI is revolutionizing how we need to think about data in general and visuals in particular. It’s a powerful technology that comes with incredible potential to improve our lives. And, as with any paradigm change, it also raises questions and challenges.

We need to start having a conversation about how synthetic media will impact our lives now, while we’re still at the beginning of the journey, so we can make sure that we meet all the challenges head-on and maximize its potential for good, while mitigating the risk for abuse.

Until now AI was able only to say something about data that already exists. It can give you a recommendation on Netflix, predict when you might get stuck in traffic or recognize an object or person in a picture. But that’s the limit of what it could do.

Generative AI goes several steps further. It’s able to create new data from scratch, aka synthetic data, that’s of the same quality as something created by humans. This can be done with a variety of media — text and speech, for example. At Bria, we’re using this technology to generate high-quality images and video.

Changes to an image using Bria's technology including new models and backgrounds

A photo is no longer frozen in time — Bria’s generative AI can rapidly adapt visuals for new audiences by creating new facial expressions, an entirely new model and new scenery.

Already at Bria we’ve harnessed the power of generative AI to empower our users to perform many modifications to existing visuals. They can change the models in a photograph — their expressions, age, appearance. Users can even replace the model with a generated person that resonates with their target audience, and then bring them to life in a realistic video.

Bria’s platform empowers users to communicate visually with no need for a camera or Photoshop.

One of my favorite scenes in Mary Poppins is when she leads the children to jump inside Bert’s street sketch. Now you can do that with AI, taking any 2D photograph and generating a realistic 3D world that allows you to figuratively jump inside, although without the singing penguins (so far).

A still image of a street being turned into a video that makes it look like you’re walking down the street.

Bria lets you transport yourself into a still image, turning it into a video in seconds.

You can also add or remove objects — historically this was always a challenge because of the holes you leave behind in the picture. But generative AI can fill in those holes to make it look like nothing is missing.

And we also enable people to generate visuals that match their brand guidelines — mood, coloring, adding their logo. You can set these values and apply them instantly among an infinite amount of images.

This is just the beginning. We’re getting closer and closer to the day when someone will just ask a platform to generate a visual and the AI will deliver a perfect result instantly. With no camera, no Photoshop needed. From thoughts to visuals in a few seconds. It sounds like science fiction, but it’s already here.

The potential of this technological evolution is truly incredible. Storytelling is one of the most innate human habits — we have evidence of it going back millennia. Today, professional visual storytelling is restricted to a small group of people who have the skills to realize a vision — either in traditional art forms like painting, sculpture and the like; or digitally with software like Photoshop.

If you’re planning a traditional photo or video shoot — in the movie or music industries, or for creating marketing visuals, for example, you need money and time not just for the filming/photography — but also editing, post-production…and did anyone mention the endless feedback loops?

Generative AI changes that. It’ll empower everyone with creative independence — there’ll no longer be a need to practice for hours to refine your artistic skills — all you’ll need is a vision. Then input that vision into the system, and the AI will generate it for you. Rosebud AI’s technology can already generate realistic scenery, for example.

Examples of synthetic scenery generated by Rosebud AI.

This is nothing short of revolutionary. Today, the people at the top of the creative professions have a combination of a creative mind and the skillset to realize their vision. So there’s a massive untapped resource of creative people who can’t succeed because they lack the executional skills or budget to do so.

But in the future, there’ll no longer be a barrier to creativity. The most creative minds will be at the top of the creative professions, because all they will need is a powerful imagination, and the machines will realize their vision. For example, Hour One can take text and turn it into a talking presenter.

AI-generated presenters developed by Hour One.

There are other aspects to an image no longer being representative of a moment in time that we and everyone in our industry have to take under consideration. Because it means that we can’t necessarily rely on an image as a reliable witness. It will no longer necessarily be accurate to say “seeing is believing”.

Of course, people have been modifying images for centuries to make them fit their narrative. But as it becomes easier and more accessible to do so, we need to adapt ourselves to a world where we can’t necessarily believe our eyes.

Video of an image of four hot air ballons with one being removed by generative AI technology.

If anyone can remove objects from an image in an instant, can we still “believe our eyes”?

Companies developing this technology need to consider how to put safeguards in place to prevent its abuse, particularly as its productization will lead to the democratization of generative AI. We have a responsibility to lead the discussion about how it can be used without compromising ethics and values.

One thing’s for sure, this is an exciting time for the world of creativity, and arguably, humanity as a whole. I wonder if people realize how close we are to recreating a metaverse that you can experience just like the holodeck from Star Trek — that actually looks real. I look forward to sharing more thoughts about it in the weeks, months and years to come.

I’d love to know what you think about the potential of synthetic media to unleash a new world of creativity. Please share your thoughts in the comments.

Yair Adato is the Co-founder and CEO of Bria.