I've mentioned this before, but I think it's so revealing and important to understand that I want to convey it again:
Suppose you have two images of different people and you want Nano Banana to take the clothing and pose and orientation of the first image but make it look like the face of the second image so that it's perfectly recognizable.
The obvious way to do this, and the conventional wisdom for a long time, was to make some big, detailed prompt that specifies exactly what you want to happen and even include a bunch of things to look out for to prevent known failure modes.
You might have some phrases about making sure that the generated image looks "just like" the person in the second image, or that the "facial likeness must be instantly recognizable" or some other formulation.
Or conversely, you might specify that the pose and clothing and orientation of the generated image must match that of the first image.
And perhaps early testing taught you that there are some failure modes you had to watch out for. As an example, you might include in your prompt that, if the person in the first image has a beard, but the person in the second image doesn't have a beard, that the generated image should definitely not have a beard.
All these things sound reasonable, do they not? And here's the weird thing: the more stuff like that you include in the prompt, the worse it will work! Now, in this example, it might "work" insofar as it will be a picture of the person dressed as the other person, but it will look comically bad like one of those "face-in-hole" apps from 2010. Why?
What's even stranger is that giving a very short and schematic prompt asking what you want, like "make the person in the second pic so they're dressed like the person in the first pic" might result in a much more pleasing and realistic image, even if you might need to generate it a couple times to get it just right. Again, why?
