"Multimodal AI" Rearranged My Living Room
December 2023
Yet again I'm astonished by new AI offerings. ChatGPT Plus
can now apply its seemingly endless intelligence to images.
When I first downloaded the latest version to my iPad, eager
to see what it could do, I uploaded a photo of a bird taken by my trail camera.
It correctly identified it as a song sparrow.
Then I used the camera feature within the ChatGPT app to
take a photo of my aloe plant and asked why my plant looked unhealthy. It
identified it as an aloe vera and gave me detailed instructions for proper
care.
Then I took a photo of my living room and asked it, "Do you
have any suggestions for rearranging my living room?" In its typically friendly
manner, it replied, "Of course! Based on the image you provided, here are some
suggestions to consider when rearranging your living room."
It quickly proceeded to give me a list of great suggestions,
such as adjusting the location of the couch and chair, adding decorative items
within my bookshelf, hanging the wall art at eye level, changing the location
of the rug, adding a coffee table, and more.
It's one thing to correctly identify a bird, and quite
another to identify the objects in my living room and then help rearrange it.
This latest development is called multimodal AI. It combines
text generation, image recognition and image generation, voice recognition, and
speech synthesis. Or in the words of their promotion, "ChatGPT can now see,
hear, and speak."
If you have ChatGPT Plus on your Android device or iPhone or
iPad, you can carry on a conversation. I can speak my queries, and it talks
back to me.
Also, ChatGPT Plus can now generate images via their new DALL-E
3, and it's impressive. You may remember when I wrote about image generators in
the past that each failed to create the image I requested: an elephant on
horseback. I have no idea why that image came to mind, but it certainly foiled
them.
So of course when I tried the new image-generation
capability in ChatGPT, I asked it for an elephant on horseback. Success!
I write about ChatGPT Plus because it's clearly leading the
way at this point, with these amazing new features. But the disadvantage is
that it costs $20/month.
However, I was amazed to find that all these features are
available for free in Microsoft's Bing Chat, which incorporates ChatGPT-4. I
gave it the same query about rearranging my living room. It took much longer
and listed five suggestions, but only the first dealt specifically with what
was in the photo and wasn't really apt. The other suggestions were generic.
Still, it recognized that there was a couch, chair, and coffee table in the
photo.
I also asked it to generate a photo-realistic image of an
elephant riding on horseback, using the same prompt as with ChatGPT. It failed.
Instead, it gave me really nice images of a person riding on an elephant with a
horse also in the image.
Given that Bing Chat integrates ChatGPT-4 and DALL-E 3, it's
not clear why its performance doesn't seem as good. But I'm certainly impressed
with its range of features. As I write this, the desktop version of Bing Chat lets
you speak your prompts and can talk back to you--a feature not yet available in
the desktop version of ChatGPT Plus as I write this.
To use Microsoft's Bing Chat, you need create an account on
Microsoft's website and go to www.bing.com using their free Edge web browser.
Google's free Bard (bard.google.com) is close behind ChatGPT
and Bing Chat, and I keep reading they have a forthcoming AI model that will be
even more powerful than ChatGPT. As I write this, Bard's multimodal features
are limited to responding to photos you upload. So of course, I had to ask it
to help rearrange my living room, giving it the same photo and prompt as the
previous instances. It was a bit more specific than Bing Chat, but not nearly
as good as ChatGPT-4. Bard also lets you speak your prompts.
All three, then, can interact with images. (Note, though,
that to safeguard privacy, these image recognition features won't identify faces.)
I initially enjoyed ChatGPT's suggestions for rearranging my
living room, but frankly, I didn't take them seriously. Then, darn it, over the
coming days as I'd walk through my living room, I was increasingly aware that
my plush armchair did indeed feel out of place.
Finally, I couldn't stand it and did just as ChatGPT
suggested: I moved the couch and chair out farther from the adjoining walls and
moved the chair closer to the couch. And placed the rug as ChatGPT suggested.
It really did have an effect of changing the character of my living room,
making it feel more balanced.
Thank you, ChatGPT.
© 2023 by Jim Karpen, Ph.D.
E-mail
Jim Karpen
|