AIEvaluation
I Tested Anthropic's Official Skills on All Three Frontier Models
Anthropic publishes skills for Claude Code: structured task definitions that tell the model what to build and how. They're designed for Claude. I ran them on GPT-5.4 and Gemini 3.1 Pro too. GPT-5.4 won. Not by a little. GPT-5.4 averaged 0.876 across four