using Nano Banana Pro multimodality
I wanted to test Nano Banana Pro not just as an image generator, but as a multimodal system that can combine instructions, reference images, and structured visual data.
What I did:
I wrote the prompt as a set of instructions for an LLM, not as a classic token-based image description. (The prompt does not contain any text that should appear on the cards.)
I used images with avatars as well as a separate hint image with all the information that needs to be correlated with these avatars (rarity, nickname, number of followers).
Inside the prompt, I gave explicit instructions to:
read all textual information only from that reference image
bind specific avatars to specific cards
visually separate Rare, Epic, and Legendary cards by design
As a result, the model generated a promo-style image of collectible cards, where the layout, text, and hierarchy were derived from multiple input sources instead of being hardcoded into the prompt.
Prompt used:
If, for some monstrous reason, you don't know the authors on these cards - @Cilia, @Mrs_Hyde, @TheLordofnothingness, run and enjoy their incredible creations!
If you want to try something similar, I’m running a collectible cards challenge — you can make cards on any theme you like.
