How can we really objectively quantize an Ai model’s true abilities? Can we develop various objective tests to evaluate and grade Ai image generation models such as SORA on various aspects such as aesthetics of images or video, temporal coherence, how well the model seems to have understood and rendered the details of the prompt, and especially how well the model can render multiple subjects “interacting” together? A favorite would be:
Two people dancing the TANGO. Music generation an added bonus.
Currently we’d assume such prompts would make for good comedy.
When we see Ai video of people wrestling or dancing the tango, then you’ll know we’ve reached escape velocity, and the future is limitless, and it will be a beautiful thing.