A look at Google Gemini.

Google is apparently preparing to launch Google Gemini, a revolutionary AI product. What we know so far and what’s still up in the air, we take a look here.

Table of Contents

What does Gemini stand for ?

That part at least seems pretty clear beyond a shadow of a doubt:
Generative Enhanced Multimodal Intelligent NetworkInterface.

The word “Gemini” comes from Latin and means “twins” in German.
Some possible meanings in the context of Google’s AI system:

Gemini combines two components: Text and image processing. It is, in a sense, a “twin system.”

Gemini could refer to the “twins” Sergey Brin and Larry Page, the founders of Google.

Astrology assigns communication strength and flexibility to the zodiac sign Gemini. Gemini as an AI assistant aims to adapt linguistically and situationally.

The name suggests a dual strength or ability. Gemini aims to unite Google’s text and image AI to outperform the competition.

As a twin system, Gemini combines different perspectives and approaches, similar to different human characters. So the name is both an allusion to the system’s integrative capabilities and a promising indication of Google’s ambitions with this AI product.

Why is Google superior?

To do that, you have to understand WHAT treasure trove of data Google is actually sitting on. Here are a few facts:

Google, through its various services such as Google Search, YouTube and others, has an enormous amount of data that is very useful for developing AI systems.

On YouTube alone, over 500 hours of video material are uploaded every minute, according to Statista. That’s 30,000 hours of video uploaded every hour as well as 720,000 hours uploaded to YouTube every day. The total video database is over 30 million hours of video. The subtitles and transcripts of these videos give Google a gigantic text dataset for training language models.

According to a report by ARK Invest, Google owns over 130 exabytes of data. For comparison, 1 exabyte is equal to 1 billion gigabytes. This means that the entire data set comprises more than 130,000,000,000,000,000 bytes of information.

Google Search accounts for a large part of this data. Google says it processes over 40,000 search queries per second. That’s over 3.5 trillion search queries per year. From these queries and the clicked results, Google gains further insights.

Overall, it shows that Google has virtually inexhaustible data resources for AI research. Both the breadth of different types of data and the sheer volume should give Google a significant edge in the AI field.

Google – The Research Giant

In 2020, Google published over 1300 artificial intelligence research papers, according to the Papers with Code database. In 2021, Google increased the number of publications significantly again to over 2000 papers on AI and machine learning.

Topics included:

Computer Vision (image recognition)
Natural Language Processing (NLP)
Speech Recognition
reinforcement learning
Robotics
Multimodal AI
Recommender Systems
Applications in medicine

With over 3300 AI publications in 2020 and 2021, Google has greatly expanded its research output in artificial intelligence. The company is one of the most active players in this research field. This intensive work over the past few years is now being incorporated into the development of Gemini.

According to the AI publication database Papers with Code, Google published more than 1,500 artificial intelligence research papers in 2022 alone. That’s far more than other tech corporations like Meta or Microsoft.

This is a partial selection of Google’s most groundbreaking developments in AI in recent years. The list shows the enormous range of research from machine learning and computer vision to robotics and autonomous systems.

AlphaGo: Go game AI that defeated world champion Lee Sedol in 2016.
BERT (Bidirectional Encoder Representations from Transformers): breakthrough language model for NLP from 2018.
PaLM (Pathways Language Model): enormous language model with 540 billion parameters from 2022
PaLM-SayCan: variant of PaLM that can carry on human-like conversations
Imagen: image generation AI for realistic and creative images
MusicLM: AI for music composition and production
RLHF (Reinforcement Learning with Human Feedback): Reinforcement learning with human feedback
Model Based RL: reinforcement learning with explicit models of the environment
RobustFit: Robust neural network against data noise
T5: Text-to-text transfer transducer for various NLP tasks
ViT (Vision Transformer): Image recognition with Transformer architecture
WAYMO: Autonomous driving and robot cab service
ProteinFold: Protein structure prediction with Deep Learning
FLOOD: AI for flood prediction and prevention
SLIDE: pixel-level image segmentation
Switch Transformers: efficient architecture for very large transformers
MuZero: reinforcement learning without environmental model in games
Meena: conversational AI from 2020
DALL-E & DALL-E 2: text-to-image generation.

When you look at the sheer amount of data Google has collected over the years, it initially makes you dizzy. Over 500 hours of video footage are uploaded to YouTube every day. The total video database is over 30 million hours. Add to that countless search queries, texts, images and conversations. It’s an almost unimaginable amount of data.

Coupled with intensive research activity in the AI field, it adds up to enormous potential. In recent years, Google has produced groundbreaking innovations such as the BERT language model, the AlphaGo Go AI, and the DALL-E image generator. When you put all these puzzle pieces together, things take on almost frightening proportions.

Project: Google Gemini

With the new Gemini AI system, Google now seems to have bundled the essence of these years of data aggregation and research. If the company succeeds in combining all of its AI developments and treasure trove of data in this system, it would be a demonstration of the sheer power of innovation. It will be interesting to see whether Gemini can deliver on this promise. In any case, the expectations are huge – here what we know and what the rumors say:

Facts Google Gemini

There are already some facts from the Google Blog:

Gemini is supposed to be released this fall
Gemini combines text and image generation
Can create contextual images based on text generation
Has been trained with YouTube transcripts
Google lawyers are monitoring the training to avoid copyright issues
Gemini is said to have multiple modalities, e.g., text, image, audio, video
Sergey Brin is involved in development

Rumors

From Reddit and countless other sources on the web, there could be other features as well:

Gemini is said to be capable of AI image understanding and modification
Is said to combine text capabilities like GPT-4 with image generation
Has been developed from the ground up as a multimodal model
Could handle audio, video, 3D renderings, graphics, etc.
Shall learn with user interactions and thus become effective AGI
Architecture could enable lifelong learning
There are concerns about privacy and information leaks between users

Google Gemini and the (then new) AI market:

The AI market situation is likely to change significantly with the introduction of Google Gemini:

For OpenAI:

Strong new competitor for ChatGPT and DALL-E.
Google has significantly more resources and data
OpenAI could lose market share and come under pressure

For Anthropic:

Claude must stand up to Google Assistant with Gemini
Advantage due to focus on security and control
Risk of falling behind

For Microsoft:

Partnership with OpenAI important to compete with Google
Microsoft must further develop Azure AI services
Advantage due to strong cloud infrastructure

For others:

Startups could have a very hard time against Google
Consolidation in the market possible
Significantly higher innovation speed

Overall, competitive pressure in the AI market will increase sharply. With its resources, Google is in a very good starting position to take a leading role with Gemini. It will be more difficult for other providers to keep pace with Google. It remains to be seen whether the high expectations for Gemini are justified.

Google Gemini Conclusion

Google Gemini seems to be a very ambitious AI project that should give the company a competitive edge. The combination of different modalities in one model is new and could improve AI capabilities tremendously. However, there are still many unanswered questions regarding the specific capabilities and data security. The release this fall will show whether Google can deliver on its promise to outperform the competition. Much is still speculation, but expectations are high.

Google Gemini: Facts and rumors

ByOliver Welling

What does Gemini stand for ?

Why is Google superior?

Google – The Research Giant

Project: Google Gemini

Facts Google Gemini

Rumors

Google Gemini and the (then new) AI market:

Google Gemini Conclusion

#ai #ki #google #gemini #text #image #multimodal

By Oliver Welling

Related Post

ChatGPT Memory ist da – in den USA

KINews24 Update, Dienstag, 30.4.2024

Transformer-Trick: Sprachmodelle lösen knifflige Aufgaben mit Punkten

You missed

ChatGPT Memory ist da – in den USA

KINews24 Update, Dienstag, 30.4.2024

Transformer-Trick: Sprachmodelle lösen knifflige Aufgaben mit Punkten

GitHub Copilot Workspaces veröffentlicht

ByOliver Welling

What does Gemini stand for ?

Why is Google superior?

Google – The Research Giant

Project: Google Gemini

Facts Google Gemini

Rumors

Google Gemini and the (then new) AI market:

Google Gemini Conclusion

#ai #ki #google #gemini #text #image #multimodal

Related Posts

By Oliver Welling

Related Post

You missed