Well, it started with conversations with friends about the future of AI-generated images. Will they replace existing actors. Why would people even think they would?' And in the middle of that I thought, 'What about the opposite?'' How can I generate beautiful people without generating existing famous people?
I'm going to pause here to tell you that I sometimes get a little long-winded and just so you know, there's nothing below that you need to know to play with my little site, so read on only if you're interested in the process. OK, back to the story'
The reality is that these models have all been trained on giant curated image libraries (as I understand it). And what that means to me is, 'famous people and stock imagery.' Now, there's no way I will know if I've used the face of someone in stock imagery, but famous people, that's an interesting topic. So, I started making a list of famous people and seeing if I could generate a reasonable facsimile (without loading a LoRA that was specifically trained on a specific person, I mean). I picked mostly women, because well why would I want to look at other men for hours on end when you could look at beautiful women (it's not like they are any more unavailable than the real famous women they are based on)?' I picked people that I knew (who were famous to me) and I did a lot of googling to find other famous people that I didn't know about. I ended up with a fairly long list. I tried to throw some interesting faces into the mix. Because of the requirement that they have a lot of public photos, there are a lot of models in there.
The task at this point was a little daunting, since it takes a bit to edit the prompt, generate a few pictures, find a picture of the person, compare them. So, I did some more googling and found out that ComfyUI has a REST API that I can call to get workflows to run and then it was trivial to write a program to modify a generated workflow to add names from a list. The results were very interesting. You could easily tell how famous a person was by comparing their photos. Some people who I think are very interesting simply don't have a lot of photos out there to train on and their photos all look about the same.
OK, so now I'd gotten my feet wet, I went back to the original question. I had noticed that if I used the name of a famous person in my prompt, I was much more likely to get a better face on people in the images. But I obviously also got people who look a lot like the famous person. However, then I discovered that if I used verbiage and parenthesis to combine famous people, I still got better faces, but they no longer looked as much like the original people. I also noticed that the order in which I used the names mattered. A lot. Well, you probably already noticed that since you probably saw that before you got to this page.
I started combining famous people, usually in pairs, but I got a little advantage from triads. I got some really cool stuff (well, you tell me, you're reading about it here). I started using it to build better images with no doubt that the faces often looked different enough to work. Then, of course, I thought, why don't I just generate all the combinations and see if I could find some real, gorgeous (because let's face it, it's generally more fun to look at pretty people than ugly ones), faces that I could use.
One thing that surprised me is that it's often in the way the person is looking, standing, etc. that reflects the 'secondary' person in the mix, rather than overt face changes. I guess that makes sense, obviously some of those people have mannerisms that they carry through to multiple roles that I've obviously picked up on. So, you tell me, what surprised you looking through the photos?
Some have asked me what prompt I used and so here it is, '(Gorgeous Photo:1.3), professional model portrait, three-quarter body, (Ultra detailed:1.3), (Ultra detailed face:1.7), (perfect hands:1.4), (whole body:1.9), In a studio with a light-gray backdrop. In the center of the shot is a woman who looks like ({xxxx} combined with {yyyy}) wearing fashionable, but sexy, clothes. Dramatic lighting.
My interpretation of how SDXL works, is that it gives a relative priority to each term in the prompt as it processes it, with the first terms (or tokens) getting the highest priority. So, for example, if you say "mountain" very early in the prompt and later in the prompt you say "desert" then it will prioritize the first term over the secondary term and you'll get more mountains than deserts. This priority (weighting) is what you are tweaking when you add something like (desert:1.1), you are raising the weighting of that word. This would imply that if the word is later in the prompt, the added weighting has less effect and so must be higher to make a difference, but from my experience I don't think the token order weighting is very large (my guess would be something like 0.1 or even 0.01). If anyone knows more about this I'd love some feedback on my admitedly anecdotally-biased understanding.
In this case I am giving it the names of the two models in a specific order and connecting them with a preposition (I tried several to see which I thought worked best and "combined with" seemed the most consistent. If I say "Cindy" and then after that I say "Samantha" then it will prioritize what it thinks of as a Cindy more than it will what it thinks of as a Samantha. Since these are all famous people of who it has likely trained photos tagged with their name, it will also prioritize them based on the strength of it's "knowledge" (or memory?) of each person, so if Samantha has 1,000 pictures that were used to train the model and Cindy only has 200, then no matter what order you use in the prompt it's likely to make the final image Samantha. If you prioritize Samantha first, then the combination of the weights of being first and having 1,000 pictures trained, then the image will probably look like Samantha and have almost no Cindy in them.
However, since I have no idea what pictures the model was actually trained with, I can only go buy the order, so I call the first model the Primary and the second model Secondary, and have noticed a clear trend while looking at all these pictures while generating them that for models that are "equally famous" (if that term is even measurable) then the base features of the resulting "baby" will come from the primary (skin color, hair color, eye placement, and probably 100 things that I'm not equipped to notice) and other things will come from the Secondary (face shape seems like a common thing that can be seen in the secondary).
Some of the things that caused me the most trouble was that something in either ComfyUI or SDXL occasionally crashes my GPU. Not the computer, mind you, it usually stays running just fine. I can remote desktop into the system after a crash and try to resurrect the video card. So far nothing I've tried has been able to do that. 'I initially tried to treat it as a driver problem and so running the NVidia driver install almost always clears up even the most heinous problems. However, the video card is no longer listening to the bus, it just appears to be completely offline. It's the strangest thing. Rebooting is the only option (either by hitting the panic button or remoting in and forcing a reboot). It happens so often I have a hard reboot button on my Stream Deck to trigger a full reboot (I've given up on a remote-desktop-based solution).
I have spent considerable time trying to figure out why it crashes sometimes and not others. For example, if I lock the computer that raises the likelihood of a crash significantly. Strangely, if I watch video on the computer that seems to lessen the likelihood of a crash. Why, no clue. My longest session was 21 hours running continuously (or nearly continuously, I did interrupt the process once or twice for a few minutes). During that session I generated 3,812 images, the most in one night ever. It turned out if I generated the images in batches of 80 images, then rebooted ComfyUI, then paused for 2 minutes, then I could lower the likelihood of crashing quite a bit. It still crashed in hour 21. What could have been different about that 3,813th image that put it over the top?' I feel like I'm dealing with the problem from The Expanse and my ships are going dutchman for some very esoteric reason I'm just not seeing.
I plan to keep working on it. Who knows, by the time you read this last paragraph maybe I figured it out (I'm sure I'll have posted it to Github for either ComfyUI or SD with what I found, so go look there). Thanks for reading, you three people who got this far in the story. Since you got this far, if you have any suggestions for this, questions about this, or ideas for a similar project, please do e-mail me at SDXL_projects@pingbot.com.
Return to the project home