Are we ready for the next steps of artificial intelligence?
Disclaimer: This is the translation of an article published at TAB UOL.
At first, Google was the first company to release a massive solution using artificial intelligence (AI) to generate images, Deep Dream. Back in the day, this technology looked very exciting, probably for its potential rather than what it was actually doing at the moment — a scary psychedelic creation filled with random dog faces.
More recently, mobile apps like Zao began to automate deep fakes, through which people could replace the face of superheroes and Hollywoodian celebrities for their own. Also, Wombo.ai became viral on social media precisely for its ability to transform any portraits into a singing video.
Now, the same company released Dream.ai, an AI based on text prompts. You just need to write a sentence or a word and the neural network will generate an image with the previously chosen style (psychedelic, fantasy, realism, expressionism etc).
This is also the mechanics behind the platform Pollinations.AI, created by the computer scientist Thomas Haferlach. After studying Computer Science & Artificial Intelligence at Edinburgh University, Thomas spent 9 years in São Paulo creating an art collective, while always keeping one foot in the technology sector. Pollinations.AI is one of his most recent enterprises.
I asked Thomas how the technology works. He explained to me that these programs use deep learning models, which is a machine learning technique in which the computer is trained to do certain tasks. With the development of this technology, generative models for images also became popular, but the applications were still too limited.
In Thomas’ opinion, “the magic starts to appear when using these generative models in combination with models from other modalities, e.g. models that can understand how texts and images relate to each other.”
This was the premise behind the CLIP model, released ten months ago by OpenAI. It is capable of judging how much a text corresponds to a certain image. “This model learned from a huge dataset of millions of images with associated text captions,” explains the developer.
Surprisingly quickly, a community of hackers, researchers and artists managed to connect the CLIP model to other models that can generate images. To everyone’s surprise, simply connecting these models allows a human user to write a short sentence and the machine learning model will draw a creative interpretation of these words.
Thomas explains the process through an analogy in which a fictional king who is very keen to make any art he desires, but he does not have the skills or time to do it himself. To fulfill his desire, this king has access to a deaf painter who is virtuous at drawing any kind of image, but is not able to understand text; and an art critic who has seen and read about many artworks and images, but has absolutely no talent at painting. Still, the art critic can communicate with the painter and confirm if his work is correct or not according to the king’s request.
“At first, the king will tell the art critic his desired painting, for instance ‘Draw me a painting of Glowing cacti and peyote flowers in the Arizona desert dream night. Painted by Shaun Tan, Ivan Bilibin and Studio Ghibli’,” explains Thomas. “This is the prompt that will make the painter start by drawing a random image. In the meantime, the art critic looks at the image and gives the painter a thumbs-up sign if he managed to get close to the prompt of the king or a thumbs-down sign if he is far away.”
Together, the two allegorical characters modify the image and repeat this process through different phases, until the point when the drawing looks closer to what the king actually requested. “Once the art critic is happy they are done and the king can enjoy his unique painting.”
It is during the processing time that the generative AI works on modeling an image — in some cases, such as Pollinations.AI, it is possible to watch it live. The way the program works is through two phases: training and inference. “For training, researchers scrape large datasets of images, texts or other media from the internet which are then repeatedly fed into a large neural network, which is a simplified mathematical abstraction of a biological brain,” explains Thomas.
In this first phase of training, the neural network learns to represent all of this data internally, which means it does not need access to the internet or the original data to recreate the content. “Since it cannot simply memorize all the images, the process forces the network to compress this information and find more abstract ways of storing it internally,” he follows. “Once a model is trained, it can be downloaded and used to generate new content or combined with other models as in the text to image use-case.”
But why should the model only access these abstract data instead of connecting directly to a database? Thomas mentions an article published at a blog of Stanford University in which researchers discuss the benefits of allowing these models to access external data. However, this is only a hypothesis, and it is very recent. There is no certainty if it is really going to work.
In the meantime, artists can use these abstract creations made by platforms like Dream.ai or Pollinations.AI during their own creative process. For Thomas, in spite of still being a niche, there is a considerable amount of people already experimenting on it. “In my opinion, It is only a matter of time until it becomes mainstream. Even my colleague’s dentist has started using Pollinations.ai to generate art.”
Until December 18th, the König Gallery in Berlin was holding the exhibition “MACHINE HALLUCINATIONS: NATURE DREAMS”, created by the Turkish-American designer and artist Refik Anadol. With smaller screens and a big room filled with a gigantic LCD monitor, Anadol’s work is the result of research about the intersections between human consciousness, data about nature and machine intelligence.
As described by the gallery in their website, the exhibition features “a giant data sculpture displaying machine-generated, dynamic pigments of nature,” besides a public art projection on the tower of ST. AGNES, which is created based on environmental real-time data collected from Berlin.
To create these generative images, Anadol and his team collected data from public and private digital databases, then processed millions of photographic memories with machine learning classification models. After that, these images were organized in thematic groups so the semantics of the data could be better understood, then expanded.
As proposed by the world chess champion Garry Kasparov, when he was defeated by IBM’s AI Deep Blue, a human or a machine alone do not have such a huge potential as they have when they are working together. As we follow these developments in technology, the question that begs is whether we will ever reach a point in which AI will surpass our capacities — what Ray Kurzweil calls the Singularity, or which could be referred to as the achievement of a General Artificial Intelligence or AGI.
Thomas thinks his opinion might be a little more radical than other people’s, but his argument is based on all these advances with language modeling and multimodal learning, that is, the ability to connect different data formats (video, audio, text etc). “For me, we are on a clear and unstoppable path towards AGI,” he says, giving as an example the language model GPT-3 — the same used by Allado-McDowell to write their book “Pharmako-AI”.
The developer explains that the model is quite simple: you only need to give some words and the AI will predict the next word in the sentence. “By doing this repeatedly, GPT-3 learns to write and continue text very well. In order to write convincing text and be indistinguishable from a human it needs to imitate how a human would write as closely as possible,” explains Thomas. “For this it needs to develop a form of self-awareness or consciousness. Just to meet the training objective of predicting the next word in a sentence.”
Talking about machine consciousness, therefore, is not something as absurd or relegated to science fiction anymore. In fact, researchers such as the neuroscientist and philosopher David Chalmers are already talking about this kind of subject, specifically in the case of the GPT-3 model:
“What fascinates me about GPT-3 is that it suggests a potential mindless path to artificial general intelligence (or AGI). GPT-3’s training is mindless. It is just analyzing the statistics of language. But to do this really well, some capacities of general intelligence are needed, and GPT-3 develops glimmers of them. It has many limitations and its work is full of glitches and mistakes. But the point is not so much GPT-3 but where it is going. Given the progress from GPT-2 to GPT-3, who knows what we can expect from GPT-4 and beyond?”
In the futures studies, the concept of Amara’s Law is used to say that, in the short run, we tend to think that a technology will do much more than it is actually capable of. In the longer run though, it is harder for us to visualize a turning point. This is the great riddle of exponential technologies: they are characterized by their development that happens at an exponential rate, not linear, which is how our brain works. With that in mind, Thomas’ opinion makes even more sense, which is something that might make us anxious — whether for fear or for anticipation.
Did you like the post? What about you Buy me a Coffee? :)