Images power so much of our interactions and knowledge seeking and sharing online today. Whether it’s a photo, illustration, meme, a GIF, or an infographic, visuals work well to inform, engage, educate and entertain, because what they say is true: pictures do speak a thousand words!
So when visuals and images are so powerful – why should our search queries be limited to text?
Google has now officially rolled out multisearch: a new way to search with images, and ask questions about images, with the help of Google Lens.
During the Google Search On event in September 2021, the team explained in detail their MUM (Multitask Unified Model), an algorithm created to search the internet across different languages, text, images and media to help users find answers to detailed questions. (You can get a detailed breakdown of the MUM update here.)
And combining text + multimedia search is a key aspect of MUM. So, here’s a closer look at multisearch.
Why Multisearch?
It’s pretty simple: Google has known for a while that sometimes, it’s just not possible to describe exactly what you are looking for in words. You could be looking for a floral pattern in clothing, a tutorial for nail art, instructions on a certain cooking technique, but you don’t know how to explain it in words.
Multisearch gives you the tools to conduct a search using text and images, together. Maybe you are looking for something similar to what you’ve seen (like a green version of the blue peacoat you saw at a clothing store), or you want to learn more about something you saw during the course of your day (that strange-looking fruit you saw at the farmer’s market that you clicked a picture of).
How Multisearch Works
The aim of multisearch is to make your search results more valuable/helpful (and closer to what you are looking for). According to Google, “you can ask a question about an object in front of you or refine your search by color, brand or a visual attribute.”
So with multisearch, you get to define your search better (using images) and refine it further (with detailed visual prompts).
Here’s how it works in action.
Open the Google app on iOS or Android.
Tap the Lens camera icon.
Search using a screenshot or photo you’ve taken.
Swipe up and add the “+ Add to your search” to add text.
To put it simply: it can work wonders to narrow and refine your search and give you answers to literally what you are looking for.
Shop with multisearch: A simple example that Google has given is: you come across a dress you like. It’s orange. But, you’d rather buy it in green. You can take a screenshot of the dress, add the text query “green” to the search, and Google will throw up results of green dresses.
Learn more and ask questions with multisearch: And it’s not extended to shopping – you can take pictures of things in the real world and learn more about them. For example, combine an image of suede shoes with the text query “care instructions” and you can uncover how best to maintain your shoes.
Discover options with multisearch: Click a picture of your new sofa and add the word “coffee table” to find something that matches it. The possibilities are never-ending!
The Wrap
Google’s aim, along with organising the world’s information in the best way possible, is to make finding and discovering that information easier and more convenient. And combining text with images is a step to giving users a better way to search — and better answers.
From a user’s perspective, images and text queries can be easily combined for better results and replace the earlier format of searching by only by text or image. It also taps into how we as humans experience the world, and search and enquire about products when we actually shop.
For the time being, multisearch is available in the United States on both Android and iOS devices (in the English language) but will roll out in other regions soon – so we should be able to experience the power of multisearch sometime this year if all goes well!
Watch: Google Multisearch Explained — What You Need to Know
For Curious Minds
Google Multisearch represents a major evolution in search technology, allowing you to combine images and text in a single query for more accurate and relevant results. This capability directly addresses the common challenge of describing complex visual concepts with words alone, making search feel more natural.
The technological core of this feature is Google's Multitask Unified Model (MUM), a powerful AI algorithm designed to understand information across different formats and languages. Unlike previous models, MUM can process text, images, and other media simultaneously. This unified understanding is what enables Multisearch to connect a picture of a floral pattern with your text query for “summer dress,” delivering results that align precisely with your visual and contextual intent. It closes the gap between seeing something in the real world and being able to ask specific questions about it online. For a closer look at the technical aspects of the MUM update, consider reading the full report.
Multisearch directly solves the problem of descriptive limitation by removing the need to translate a visual idea into a perfect text query. It allows the image to provide the primary context, which you can then refine with simple text modifiers for more granular control.
This is particularly effective when you can't articulate exactly what you're looking for. For instance, you might struggle to describe a specific nail art design or a cooking technique. With Multisearch, you can simply use a picture as your starting point. This approach is superior because it grounds the search in a concrete visual reality, which you can then guide with additional context. For example:
Find similar items: Take a photo of a blue peacoat and add the text “green” to find it in a different color.
Learn about objects: Snap a picture of a strange fruit and add “how to eat” to get instructions.
Get maintenance tips: Use an image of suede shoes with the query “care instructions.”
This method reduces ambiguity and search frustration, leading to faster, more satisfying results. To understand how these principles are changing e-commerce, see the complete guide on multisearch for shopping.
To find a product that matches an item you already own, you can use Google Multisearch to combine a photo with a text query for a highly targeted result. This process is designed to be straightforward and taps into the visual search capabilities of the Google Lens camera icon.
Here is a simple, four-step plan to execute this search on your mobile device:
Open the Google app: Launch the main Google application on your iOS or Android phone.
Activate Google Lens: Tap the camera icon located in the search bar. This will open the Lens interface.
Select your image: You can either take a new photo of your sofa or select an existing picture of it from your phone's gallery.
Refine your search: Once the image is loaded, swipe up on the results panel and tap the “+ Add to your search” button. In the text box, type “coffee table.”
This combined query instructs Google to find coffee tables that visually harmonize with the style, color, and texture of your specific sofa. This practical application of multimodal search makes interior design and shopping significantly easier. You can explore more advanced shopping techniques in the full article.
These examples effectively demonstrate how Multisearch bridges the gap between the physical world and the digital information you need. They showcase its ability to handle nuanced, multi-step queries that were previously difficult for search engines to process, moving beyond simple identification.
The real value is shown in how an image and text work together. The image provides the 'what,' while the text provides the 'intent.' This combination unlocks highly specific answers. Consider these proven applications highlighted by Google:
Shopping with visual modifiers: The orange dress example shows how you can lock in a style you like from an image and then use text (“green”) to search for variations. This is a powerful tool for finding products that meet precise aesthetic criteria.
Contextual information retrieval: Combining a picture of suede shoes with the query “care instructions” goes beyond identifying the object. It seeks actionable advice related to the object, a far more complex and helpful task.
Complementary product discovery: Searching for a “coffee table” to match a photo of your sofa illustrates how Multisearch can help with creative and design-oriented tasks.
These cases confirm that the feature is not just a novelty but a functional tool for solving common consumer problems. For a deeper analysis of how this technology is reshaping online retail, review the complete post.
The primary difference lies in specificity and context, where Multisearch offers a more refined and targeted experience than its predecessors. While text-only and image-only search are powerful, they often fall short when a query requires both visual and descriptive elements.
Choosing the right method depends on the information you have and the complexity of your search. Here is a comparison:
Text-only Search: This method is ideal when you can describe exactly what you want with precise keywords (e.g., “women's green v-neck peacoat size 8”). Its weakness is in articulating visual nuances like patterns or styles.
Image-only Search: This is excellent for identifying an object or finding visually identical items. However, it lacks the ability to specify modifications, like a different color or brand.
Multisearch (Image + Text): This method excels where the others fail by combining the strengths of both. You use an image to define the core visual attributes and text to add specific constraints or ask related questions.
For complex shopping or learning tasks, Google Multisearch is superior. It aligns better with how we naturally think, by starting with a visual reference and then narrowing down the details. For more examples comparing these search approaches, explore the full analysis.
Google's introduction of Multisearch is a clear indicator of a strategic shift towards a more intuitive and human-like search experience. It suggests a future where search engines understand intent not just from keywords, but from a combination of inputs that mirror natural communication.
The Multitask Unified Model (MUM) is the engine driving this change. By processing information across text and images, it enables Google to understand the relationship between what you see and what you want to know. This moves search from a transactional keyword exchange to a contextual dialogue. Over time, this could lead to search engines that can handle multi-turn conversations about a subject, using an image as a persistent reference point. This evolution will likely make information discovery feel more like getting advice from a knowledgeable expert than querying a database. Discover more about Google's long-term vision in the full article.
The '+ Add to your search' feature is the key component that elevates Multisearch beyond simple image matching. It directly solves the problem of ambiguity in visual search by allowing you to inject precise, text-based instructions to guide the algorithm.
Without this feature, an image search for a dining room table would only return other tables that look the same. The text refinement function, however, transforms the query into a powerful research tool. It empowers you to:
Specify Attributes: Use an image of a table and add “round” or “oak finish” to filter for specific physical characteristics.
Introduce Concepts: Search with a photo of your dining room and add “table that matches” to get stylistically appropriate suggestions.
Ask Questions: Use a picture of a water-damaged table and ask “how to repair” for relevant tutorials.
This function for adding text makes the search an interactive process of refinement rather than a single, static query. By combining the visual context from Google Lens with your explicit written intent, you get results that are dramatically more relevant. To see this feature in action with more examples, review the complete post.
The rise of multimodal search demands a strategic shift away from a purely keyword-centric approach to a more holistic, visually-driven SEO strategy. Brands must now optimize the visual context of their products just as rigorously as they optimize their text descriptions.
To adapt, marketers should focus on ensuring their visual and textual content are strongly aligned. The goal is to provide clear signals to algorithms like MUM that your product image and its associated data are a perfect match for combined user queries. Key actions include:
High-Quality, Diverse Imagery: Use clear photos from multiple angles, including in-context lifestyle shots that show the product in use.
Descriptive Alt Text and Filenames: Ensure your image metadata accurately describes the object, including its color, style, and material.
Structured Data Markup: Implement schema markup for products to explicitly define attributes like brand, color, and price for search engines.
Integrated Content: Create content that naturally pairs images with relevant text, such as blog posts on “how to style a blue peacoat.”
Brands that successfully integrate their visual and textual SEO will gain a significant advantage in this new search paradigm. For a deeper exploration of future SEO trends, the full article offers additional insights.
Google's Multitask Unified Model (MUM) is the foundational technology that makes Multisearch possible. It was specifically engineered to overcome the limitations of older AI models that could only process one form of information at a time, providing the breakthrough needed for a true multimodal search experience.
Announced at the Search On 2021 event, MUM was designed with a core capability: to achieve a deeper, more holistic understanding of information by processing it across different formats and languages simultaneously. This is critical for Multisearch because a combined image-text query is not two separate searches, but one unified question. MUM can understand the concept in the photo (e.g., the style of a dress) and the nuance in the text (e.g., the desire for a different color) as a single, cohesive intent. This unified model is what allows you to find what you're looking for without knowing how to describe it perfectly in words. To learn more about the technical details presented at the event, explore the full MUM update breakdown.
Google Multisearch is an ideal tool for this exact scenario, turning your smartphone into a powerful discovery engine for objects in the real world. It allows you to move seamlessly from visual identification to actionable information in a single, fluid process.
This real-world application highlights the feature's core strength: bridging curiosity with knowledge. Using the Google app, the process is straightforward:
Capture the Image: Open Google Lens and take a clear picture of the strange-looking fruit.
Initiate the Search: Lens will likely provide initial identification results for the fruit.
Ask a Deeper Question: Swipe up and use the “+ Add to your search” button to add a text query like “how to eat” or “recipes.”
This transforms a simple identification search into a practical learning experience. The image provides the subject, and your text provides the specific intent, prompting Google to return relevant articles, videos, and recipe pages. This example proves Multisearch isn't just for shopping, but for learning and discovery. You can find more unique use cases by reading the complete article.
To capitalize on Google Multisearch, e-commerce businesses must focus on creating rich, descriptive, and visually clear product listings. The goal is to provide Google's AI with unambiguous signals connecting your product images to relevant search intent.
Here is a four-step implementation plan for optimizing your product pages:
Audit and Upgrade Product Photography: Replace low-quality images. Provide multiple high-resolution photos from various angles, on a clean background, and in a lifestyle context. Ensure file names are descriptive (e.g., 'green-floral-summer-dress.jpg').
Write Comprehensive Alt Text: For every product image, write detailed alt text that describes the item as if to a person who cannot see it. Include style, color, pattern, and material.
Enrich Product Descriptions: Go beyond basic specs. Use descriptive language that mirrors how a customer might describe the item. Mention visual attributes that a search query might include, such as “a coffee table to match a mid-century sofa.”
Implement Product Schema Markup: Use structured data to explicitly tell Google the attributes of your product, like its name, brand, color, and GTIN. This makes it easier for the algorithm to match your product to a query.
This systematic approach ensures your products are understood contextually and visually by search engines. To discover more advanced strategies, read the full guide.
This query demonstrates the sophisticated contextual understanding of Multisearch, which goes far beyond simple color or shape matching. The system interprets the image of the sofa as the primary style guide and the text 'coffee table' as the desired object category.
The underlying Multitask Unified Model (MUM) is key to this interpretation. It analyzes the visual cues from the sofa's image—such as its design era (e.g., mid-century modern), material (e.g., leather, fabric), color palette, and even textures—to build a stylistic profile. The algorithm then searches for coffee tables that are frequently associated with these stylistic attributes in its vast index of web content. It is essentially performing a design-compatibility analysis, not just a visual search. The results are therefore not random coffee tables, but options that Google predicts will harmonize with your sofa, making it a powerful tool for design and decor. For more examples of this predictive capability, review the full analysis.
Amol has helped catalyse business growth with his strategic & data-driven methodologies. With a decade of experience in the field of marketing, he has donned multiple hats, from channel optimization, data analytics and creative brand positioning to growth engineering and sales.