DALL-E 2 Prompt: Masked killer holding a butcher knife, wearing a mask that resembles a robot head, walking out of the lake. Award-winning scenic design for “Holloween horror movie”, inspired by Friday the 13th, high-quality photo from theatrical press release.Ray traced lighting, trending on art station.
Bare with me as we enter into borderline conspiracy theory territory. The machines we carry around in our pockets make us underestimate the vast amount of knowledge our forebears had access to. By extension, we underestimate our capabilities. If we are to entertain that the ancient writings from the Emerald Tablets by the Atlantean, Hermes Trismegistus are true, then humans used to be way more advanced than we are now. We could communicate telepathically and use sound to levitate heavy objects, which could be a possible explanation for the Pyramids. Even if it’s just a myth, the story of Hermes Trismagistus’s journey from Atlantis, to Ancient Egypt to Ancient Greece and then around the world in an attempt to reconnect humanity back to the universal consciousness with a network of precisely placed Pyramids throughout the world… is kind of interesting. We think ancient people were stupid, but they would think that we’re the stupid ones. I imagine them mocking us, “They can’t even communicate telepathically. They only use 10% of their brain power. They pay for energy.” This realization alone forces me to redefine what a technological advancement means. Maybe social media sites aren’t technological advancements, but distractions put in place to limit our innate ability to tap into the full potential of the human mind. Perhaps instead of judging each new technological achievement as an advancement, we could define the scope of it’s potential impact. Then weigh whether it is more likely to have a positive impact on human development or negative one. If we continue to replace human capabilities with technology, will we be able to differentiate ourselves from the machine? It wouldn’t be a Medallion XLN article if I didn’t provide thought-provoking pseudoscience to frame modern scientific breakthroughs.
Same prompt generated using Stable Diffusion. Get it? AI is a Killer App…
I’m terrified of our potential machine overlords as much as the next guy, but boy can it generate beautiful art. This use case is taking the world by storm (for those of us who have access to it). Generative AI Art allows for rapid prototyping and has the potential to bring an end to graphic designers as an entire industry. Popular AI generative models that have emerged within the last few months include OpenAI’s DALL-E 2, The Mid-Journey Discord server, and Stable Diffusion open source library for those technically adept enough to run the model locally on their machine. When this technology goes mainstream, it has the potential to shake up all aspects of entertainment. AI can generate art for book covers, costumes, and scenery for movies. AI can also render characters for video games that can be converted into 3D models. AI has always been useful, but similar to the way Instagram made a killer app for the smartphone camera, artificially generated images are the killer app for AI.
Mid-Journey: Generates vintage photos of retro-futuristic, steampunk avatars ( not generated by me)
This got me thinking, “How does it all work?” This article will attempt to deep dive into how AI Generative art works. All AI-generated art begins with a text prompt written by the user. The same prompt could produce vastly different results based on which algorithm diffusion model was used to generate the art. There are four kinds of AI models used by products and services to generate art. Generative Adversarial Networks ( GAN ) generate art from noise or an image input like the other models. The next phase in the process sends the initial generative process to a discriminator, which finely tunes the noise until it is satisfied with how closely the newly generated image matches the text prompt submitted by the user. After that phase is complete, the GAN outputs an image of what it thinks you mean by the text prompt.
DALL-E 2: Generate intimate photo portraits of 'real' people. These are faces of people and emotions we can all relate to, not airbrushed supermodels. Close-up, personal and taken at reflective, emotional moments. ( not generated by me )
Variational Auto Encoders ( VAE ), encode the input to a latent space of lower dimensions, then decode the input to minimize the space between the input and its reproduction. The VAE reconstructs the input in any way it wants according to a predefined distribution that involves Gaussian Blur that allows it to implicitly learn the data distribution. Flow-based generative models implicitly learn the data distribution and apply the transformation on the neural network in the encoder step but the decoder does the same thing but in reverse.
Art generated in Night Cafe AI by Leslie Biro ( not generated by me )
Lastly, the most popular one is called the Diffusion Model, which gradually adds Gaussian noise and then reverses it. The diffusion process takes an area of high concentration and distributes it throughout the entire canvas. Diffusion models use a Markov Chain, which is a chain that keeps track of only the previous state. This means the input gradually adds Gaussian noise at each link in the chain until the input is fully noised, then works backward to remove the noise but still preserves the data dimensionality like a UNet. A UNet is a convolution-based neural network that downsamples an image to a lower representation and uses global attention at the lower resolution layers, then upsamples it to output the image in a high-quality image. Diffusion models are faithful to the image and do not deviate too far.
Mid-Journey: Wolf in suit ( not generated by me )
DALL-E 1 used a Generative Pre-trained Transformer (GPT) model that auto-regressively generated the image with a piece of text. OpenAI implemented GLIDE, which incorporates the text prompt once the Diffusion model begins its backward generation. So the neural network generates an image with less and less noise, but has more guidance on the direction based on the text prompt. A CLIP-Guided Diffusion model was also added to GLIDE because CLIP is trained to predict a similarity score between image and text. GLIDE reduces the noise of an image with the diffusion process then CLIP compares the image and text and gives it a score as to how close the text relates to the image, then based on the score the process repeats. With Classifier-free guidance, GLIDE generates 2 different images, one guided by the text prompt and the other without text, then computes the difference between each. This then informs the model on which direction to move. The GLIDE model is more resource intensive than the GANs model.
Night Cafe AI: generated by https://creator.nightcafe.studio/u/hamdried
To people who were not interested in how the sausage is made, pat yourself on the back for making it this far. The rest of this article is for you. In the book, The Wealth Of Nations, Adam Smith observed that capitalism leads to innovation and more efficient tools. Where Generative AI Art has the potential to completely replace the industry of Graphic Design, a new role is emerging as a Text Prompt Engineer. Prompt Base is a website where text prompts are sold and similar works of art can be generated by switching around a few variables. These are some tips for generating the most beautiful AI art. OpenAI has an E-Book called “dall-e gallery”, that anyone can download that will guide on how to create consistent amazing art. The book explains how to structure text prompts to get the exact style you want, and how to generate photography to get the best results for proximity, camera angle, and lighting for indoor or outdoor. The best practices for generating illustrations, whether 3D models, 2D anime, digital media, or characters for a story. Game developers can get best practices for texture materials, ceramics, and textiles. Finally, the book goes into how users can take advantage of DALL-E features to fix details, replace backgrounds, combine images, and more. Things to consider when generating art are how to compose the image such as who is there, where it takes place, what art style it uses, whether there is an emotional vibe associated with the photo … etc. It is important to define the depth of field, lighting, and any references that can further define the generation. Something that I’ve noticed in my generation is that the first color mentioned is the dominant color, and any secondary mentioned is dispersed throughout the image. Also, adding commas to the prompt make a huge difference. From my experience, the first comma-separated section of the prompt is dominant throughout the image, and the subsequent coma-separated sections add additional details to the image. To define a complete idea, end the prompt with a period.
#Kyary Pamyu Pamyu is, in my opinion, one of the greatest musicians of all time. Her ideas are literally insane and I love them so much…
For the remainder of the article, I would like to do an experiment. When I have free time, I like to draw. A piece I wanted to compare to AI is one I drew when I got inspired by a music video from Japanese Pop Star, Kyary Pamyu Pamyu ( 200 IQ). I wanted to capture the essence of the message of her hit song, “Invader, Invader.” My concept was a Rogue AI that started wreaking havoc on a military base and only a brave low tech heroine with advanced weaponry can defeat this Robot.
Invader, Invader loosely inspired by Kyary Pamyu Pamyu’s song Invader, Invader hand drawn and colored by Me.
The experiment will try to replicate the elements of the image as close as possible in DALL-E 2 and Stable Diffusion.
This took a while to generate and I had to do some graphic design manipulation, but I was able to generate this cool image with the following prompt.
Warrior Amazon Vs. AI
“Full body Warrior Amazonian woman with bronze skin, curly blonde hair braided into a puffy mohawk, wears futuristic cybernetic purple gladiator armor with gold trimming, silver armor plates adorn shoulders, elbows and knees, prepares to shoot glowing silver bow and arrow at a 20 foot tall robot with exposed wires and gears, menacing eyes yellow eyes and gold teeth, wreaking havoc in a war torn region around destroyed and collapsing buildings, while a group of 3 military men shoot assault rifles and throw grenades at the robot menace. Dystopian futurism.In the genre of science fiction. Reflective highlights, scene out of an award winning war film.”
The prompt was too long, so I had to break up the prompt between the woman as one render and the giant killer robot as another render. I got the best results using Stable Diffusion. Then I did DALL-E 2’s in painting to expand the body into a full body render for the girl. The second one that I like is the following.
The girl and the robots are rendered separately, then combined in photoshop. DALL-E 2 was then used to fill in the missing details to make the image more square-like. I didn’t like the generation I was getting from DALL-E 2 because the face was always messed up, as demonstrated below, but the faces on Stable Diffusion renders were always amazing.
Some DALL-E 2 rejects
A quick update on the Medallion XLN project. We are currently restructuring. We were making decisions based on false pretenses, the data we were working off as far as our community was falsified. Where I thought our community was big enough to launch, that turned out to be false, so it is back to the drawing board. It also didn’t help that my Smart Contract developer who goes by the name Tom Gerard, pulled a fast one and stole the contract we paid him $12,000 for. It is so hard to trust people. New levels come with new devils, I guess. That being said, Medallion XLN is still here, we have to rework our launch strategy but when we launch, it will be the right time, with the right team with a engaged community.
Minor setback but when our time comes, we will be undeniable
Subscribe to Medallion XLN as we are building the next generation of technology using XR, Blockchain, AI, and the power of decentralization to reclaim our digital sovereignty.
Which AI Generation algorithm is your favorite to use?
currently using midjourney