llama 4 for students
Welcome back to Ctrl + Alt + Learn where we ask the questions, discuss the demos, and learn from experts about innovative applications of technology in education.
It’s open-source. It’s multimodal. It’s been caught in twitter drama.
Llama 4 is the latest family of AI models from Meta’s research lab and it's already shrouded in massive controversy. Before spilling the tea on Meta’s ranking hacking, let’s review the major changes they announced last month as well as the potential benefits to students.
llama 4 comes in three flavors, Behemoth, Maverick, and Scout, in order of decreasing capabilities. Only the second two are available as of writing, but all are natively multimodal and use the Mixture of Experts architecture for the first time!
For the curious nerds, I recommend reading the full announcement which details different engineering feats achieved by the team. For instance, the smallest scout model was trained on a 256K context window but the team developed an effective “needle in a haystack” solution to increase the context to 10M. The team was also able to discover favorable initial hyper parameters that is transferrable to various LLMs.
All in all, llama is now multimodal first but still open source which has major implications for students, teachers, and anyone supporting learners. You can try it out today in Meta products like Whatsapp and Instagram.
Multimodality
To understand why multi modality is such a big win we have to review the Learning Science principle of Dual coding.
Dual coding is the process of combining and presenting information in two formats.
These formats could be voice and text, text and image, and so on. This strategy provides more than one avenue for the brain to recollect and reconstruct information. It is also different from ‘adapting to learning styles’ which has been thoroughly debunked in research for decades now.
Adapting to learning styles suggests, for instance, that while some students may benefit more from a primarily visual learning experience, others may benefit more from a primarily auditory one. Instead, Dual Coding, insists on providing both formats (simultaneously) to all students. The former is a myth, the latter is research-based fact.
And this fact is what makes multimodality exciting! Not just because Meta is doing so, but because all the major models are leaning towards multimodality too!
Google was the first company after the text-based ChatGPT viral moment to announce a natively multimodal experience in March 2023. GPT-4 introduced images along with text and GPT-4o extended to audio and video. Following suit is Anthropic’s Claude, Deepseek’s Janus-Pro 7B, and xAI’s Grok 1.5V which are all multimodal today.
Though these models still have factuality concerns, students can take pictures of homework, study notes, or real world objects and get personalized responses to accompany the visual aid. Teams like the one behind NotebookLM can create a podcast grounded in the text or images that you share. The future of education will need administrators and teachers alike, who will facilitate an environment for an AI tutor that embraces the context of the student through the sounds and sights around them.
Open source
When models are open-sourced, the benefits of the technology can immediately impact students today. Not only do we improve learning outcomes for students (when used right) but students can also begin adapting their learning journeys and habits to the available technology. Even builders can dream up and implement various ideas with the latest technology, accelerating us towards the successful EdTech products.
When Deepseek open-sourced R1 which competed with OpenAI’s state of the art model, the ChatGPT maker began exploring more ways to live up to the ‘Open’ in its name. It is well known fact in the open source community that open leads to more collaboration, innovation, and secure outcomes.
The fact that Meta, Mistral, Google (Gemma), and now OpenAI are all contributing to this space, removes a major barrier to distributing the fruits of genAI more equitably to students around the world. But there’s a catch.
While some of these companies are building small (on-device), capable, and free models, we must not forget that there are more significant barriers to true accessibility for students, from the technical reasons like reliable power to societal ones like unstable homes.
The Drama
In the last few years, the sentiment about the capability of an LLM has primarily been driven by results on one platform – LMSYS now Chatbot Arena. This is a benchmark determined by human voters on head to head comparisons of model responses. When I discovered it, I was excited about the reliability of the process.
This excitement has quickly evaporated since the llama 4 release for two reasons. The first is that Meta casually mentioned that an experimental chat version scored an ELO of 1417 on LMArena. This was a very surprising result for Maverick (maybe more believable for the bigger Behemoth.)
Digging deeper, it was discovered that Meta submitted a model that was very optimized for this benchmark. Needless to say, Benchmark hacking is a big no no. This is not the first time Meta has been accused of using benchmark datasets in their training data, which leads to the second reason.
After Chatbot Arena addressed the confusion, they shared the head to head comparisons which would make one question the goals of the human raters (and Meta’s research lab).
For instance, many of llamas responses were very verbose, and it was hard to find the information you needed in its output. This is not a great experience for most users, in my opinion. But it is a fact that this experience is worse for students as it wastes valuable study time searching for answers in the response and risks cognitive overload with all the information overwhelm.
We need models that are not optimized for gaming benchmarks but optimized for helping students. Not all of the frontier AI companies have lost the plot in that sense. Google and Claude have invested in fundamental pedagogical improvements to their models.
This means model that don’t just give you the answers. They engage in socratic dialogue. They leverage dual coding. They provide concise explanations. They use elaboration with concrete examples.
So as we follow the news of the latest models, lets ask ourselves how does this development impact students? is this a cool demo or does this also foster learning?