Intro
This text is an account of the process of developing a pseudo rotoscoping experiment using a mixture of techniques. This experiment seeks to generate an expressive, organic-looking material that resembles the technique of drawing or painting on film, frame by frame. It is not a process of measuring the functioning of a system, but rather a free experiment within an open system. The term pseudo rotoscoping is used because “rotoscoping” determines the manual work on each frame of the film. In this process, we propose working with the help of artificial intelligence (AI) in a generative way. These tools, in general, have the property of generating images or altering them from text “prompts”, i.e. descriptive lines of the image that generate and access a network trained on billions of data and images to form a new image from a seed.
The traditional process of artistic rotoscoping has beauty in its imperfections, making the moving image expressive and rich in details, but it tends to create problems related to deadlines and budgets. It is still possible to reflect that the object produced is a film and not thousands of paintings.
In this experiment, the mix of techniques used was to film a dance performance, train the Artificial Intelligence (AI) model with authorial images and generate images that mix the characteristics of each image source and the AI model used.
We are seeing new tools emerge from the spread of generative models that produce images. Adobe Photoshop with its implementation of AI to generate and complete images brings unimaginable possibilities to the universe of people who work with images, as well as facilitating and changing the operating flow of the renowned software. Few computer programs for working with images reach so many professionals. We can also mention Midjourney generating increasingly realistic images and Stable Diffusion expanding in derivatives and possibilities of use. The latter, because it is distributed in open source, has encouraged developers to create several derivatives and make it possible to use its functions locally and for free, including the possibility of working with images in sequence and creating new models specialized in styles based on training with new images.
The fact that AI models, such as Stable Diffusion, produce images in series, using both training and external references, as well as the possibility of estimating what a redesign would look like, brings the possibility of revisiting the rotoscoping process, of what it would be like to paint frame by frame on a cinematographic reference. This doesn’t exclude painting as a creative process, but it is used to create style and, from this, to produce the frames of the film in a generative way.
Other similar processes have been used to generate animations with a well-defined style and coherence. It is important to highlight that there are other possibilities for working with animation for the SB itself, including the use of references, and tools such as Controlnet, Deforum and Warpfusion, but that in this case the desired coherence is the style, not between the frames, just the filmic movement as a guide, the appearance of noisy and contrasting passage between frames is desired in this particular case. Still others explore the generative possibility, transforming images into other figures, exploring the general silhouette and movement.
And it is also necessary to look at the work of artists like Refik Anadol, who uses generative systems to create dynamic works, in constant movement and generation, or even the experiments of Remi Mollete, using dance and AI.
Working with an open system is a choice so that, when observing the process, it is possible to extract possibilities. Prior to this experiment, two experiments were carried out that influenced the development of this experiment, the first, called Move, also using dance footage from a performance also by Taianne Oliveira and the prompt “Colorful Geometric shapes by Kandinsky” and the second, which I called Mimeograph, in which 20 illustrations were introduced in the same style for training through Dream Booth. The illustrations came from the project “CAPOEIRA NO RIO DE JANEIRO 1948-1982” and ended up influencing not only graphically the result of the production of static images, but also the semantics of representation, since it gave rise to the representation of black men and women in images resulting from neutral prompts.
From these experiences we can assume that additional training as opening the black box, a way of tampering with the mechanism. The Mimeograph experiment was initially just training to produce drawings in a specific style, but ended up preferentially producing black people for neutral prompts (where it is usually the other way around, as AI models often follow the racist and sexist bias characteristics of society, mirrored on the internet), so it is possible to say that we can even change the meaning. and certainly plastically influence images, as previous experience with the mimeograph model shows us, and the generative capacity in many paintings contrasts with the difference in its specific form, obtaining movement interspersed with many aesthetically rich forms.
The motivation for this work originates from previous rotoscoping works, their difficulties, the expressive strength of the body in danced movement, just like a living model for painting and the perspective that there are possibilities of using AI technology with human participation graphically generous.
Development
The film’s image guide is the performance of dancer Taianne Oliveira filmed on DSLR. Some treatments were tested and the black background was chosen.
70 drawings and watercolors were selected for training, including old sketches and paintings. This material and its choice were intended to formulate a style to be replicated in the creation of the animation frames. It is important to emphasize that this process did not seek purity in results, but coherence in style.
The technical aspects of the process were structured as follows: the AI system used was Stable Diffusion (SB). SB is a deep learning model freely distributed in open source, created by the company Stability.ai, which was trained using a large database of images and text called Laion 5b (originally assembled for research) that produces and manipulates images through texts. Open technology enabled a series of additional program productions, such as Dream Booth, which was used in training, distributed free of charge by Google and was later adapted to work like SB, in addition to Automatic 1111, which creates the possibility of working with SB in a user-friendly way with an interface on desktop computers.
The additional training of an AI model using Dream Booth was fed by authorial drawings, especially watercolors and the Stable Diffusion artificial intelligence model customized by the created style (called balevato) mixed image by image, as shown in the graphic below.
After the training, some tests were carried out to observe the results of the training and, with generic prompts, specifying the technique. The results seemed, in some cases, consistent with the training, sometimes approaching a classic design, and perhaps saturating the colors, but it seemed to indicate that the training had made a difference, although somewhat contaminated by the model’s other references.
The video image processing chosen was based on tests using static images of film frames, to save time. The treatment chosen was colorful, with a cut-out background, in black. In a way, the error in cropping strengthened the organic result.
From the experiments, it would be assumed that the result would be contaminated by the base, both in terms of what a “watercolor” is and its characteristics. But the desired watery look seemed to be working, even if randomly.
The images were generated following the following settings: Prompt : “balevato style” watercolor colorful / Size: 1024 × 1024 / Guidance scale:9 / Strength: 0.66 / Steps: 45 / Seed: 2082233225 / Negative prompt: Disfigured, cartoon, blurry, photo, / Diffusion sampler: DDIM / Model: balevato /
These images were generated on the same platform where the training was carried out, then the model was downloaded and taken to a local computer and through the Automatic 1111 interface a sequence of frames was generated using the img-to-img function, where it was also added the negative prompt for faces.
Uma pequena amostra das imagens produzidas, o filme pode ser visto abaixo:
The film produced using the generated frames can be seen below
At this point it is still worth noting some “notable errors” that open new paths and observations, some experiments within the experiment that serve to search for new paths. Especially the geometric synthesis of the gesture and the mix between geometric and organic can be of interest in the sense of graphic construction.
Discussion
Although it was an experiment with a specific objective, it is possible to observe some characteristics of the work using generative tools. The generation of images tends towards infinity, which seems good for generating films, but which makes us think that soon everyone will be “drowned” in images and, at that point, they may lose their meaning.
Furthermore, it brings some reflections that the ability to plan in a generative way and the possibility of changing based on choices seems paramount at this moment. That is: if we are going to design generative matrices we must be able to manufacture them, alter and measure processes, in order to enrich them. In this way, reflection could indicate to us that the artistic language of using AI is not the ready-made image, but rather generativity and dialogue with humans.
The experiment produced a film with an organic and expressive appearance, which shows the alternative to a procedural process, making the beginning of the process the generating source. This does not invalidate the procedural method, it can even collaborate, training the model from a part of the animation frame by frame, for example.
Perhaps it is a mistake to want AI models to be accurate, given that they are instruments that assess probability in a way that depends on other data. In this experiment, having some type of control is more important than assertiveness. For this control, it is interesting to observe the parameters and be able to influence the data. It is not about extracting a pure result, but using its contamination, just as watercolor water is contaminated by the pigments it comes into contact with.
When using AI, it is clear that the dimension of the source code and its creation creates infinite possibilities, which is why it is necessary to refine the process of the steps and accept a result that is not just the result of training, but mixed with the base . At this point of evaluation, the work must not only be humane, but attentive, as since these are generative processes, at scale, decisions are often seminal. This is perhaps not only applicable to imaging models, but even before imaging models. Recommender systems use artificial intelligence to indicate content and their problem is that they are closed, contaminated by recommendations influenced by ideas aimed at consumption at all costs and the commercial use of the resource, while it could be a good tool for the probability of assessment and suitability of content.
Therefore, it is also necessary to think about the appropriation of these technologies by capital, and in this sense, inferring new images in production can be a movement of freedom. From this experience, it is also not difficult to visualize a scenario where generative systems for generating images are created, and in this case, the generating model becomes the work, just like an assembly instruction manual, in this case capable of generating variations infinite.