Powering the Gen AI transformation

For most of us in the tech industry, the transformative changes of LLMs and generative AI are quite apparent. However, what might be hidden for many is the physical infrastructure powering this LLM-based transformation. Living in Ashburn, Virginia though, I am right at the heart of one of the largest data center hubs in the USA. Although many residents in the area might be unhappy with the proliferation of data centers, the demand for them is growing significantly, and LLMs are a significant driver of that growth.

According to some latest figures, data centers account for approximately 3% of total power consumption in the US. That number is expected to double in next couple of years. The next generation of data center designs are supposed to be larger and more power hungry than existing data centers. It is interesting to learn that the pace at which electrical grid connectivity reaches a new data center is slower than the rate at which a new data center comes online. This means that companies are planning for either independent power supplies or moving to areas that are closer to power plants. One example mentioned by Morgan Stanley is that data center operators are planning to build new hubs closer to existing nuclear power plants in order to get access to power more easily.

The big tech companies have all made public commitments to reducing their carbon footprints, and building more efficient data centers. e.g Google, Apple 2030 plan. The need for geographical separation of data, often dictated by local laws, and coupled with the need for redundancy and low latency, the number of data center constructions are increasing across the world. What remains to be seen is whether the renewable energy sources and other power efficiency techniques in data centers can catch up to the ever growing data set of text, audio, video that needs to be constantly indexed and fed into the models. The next time you chat with your favorite LLM, remember it might be powered by a data center next to a nuclear plant.

Posted in Uncategorized | Tagged , , , | Leave a comment

“System 1-System 2” thinking and AGI

My post on Moravec’s paradox, felt a bit incomplete. I wanted to expand a bit more was on the probable causes behind the paradox as well as add some more thoughts on Yann Lecunn’s theory on what it would take to bridge the gap between the current state of LLMs and true “AGI” (Artificial General Intelligence).

The most accepted theory behind Moravec’s paradox is that evolution is the reason behind what we humans consider easy versus hard. The things that we consider easy, all the sensory-motor stuff, is actually extremely complex but humans, and all our ancestors, have had millions of years to fine tune it and make it seem simple. The things we consider hard, all the intellectual things that take effort, are evolutionarily speaking, very recent. They are approximately a few hundred thousand years old. Evolution has not yet perfected it and so it takes a lot of effort for us to do it. Hence the paradox.

So, what does this have to do with AGI? Since the beginning of AI research, people have tried to figure out how to mimic the human brain on silicon. A lot of the early symbolic reasoning efforts were based on how our brains think about the world. Well, the problem is that we do not have one overarching theory or framework of how our brain works because we don’t understand much of it. Researchers in this field have done plenty of work, and we have great insights into many parts of our brain, but on the whole, our brain/mind remains a black box.

One great insight into how our brain/mind works was explained by Daniel Kahneman in his book ‘Thinking, Fast and Slow”. Full disclosure: I have not read the book. However the key insight is that the brain has two modes of operating. “System 1” thinking happens near instantaneously, driven by instinct. “System 2” thinking takes a lot of conscious effort. Yann Lecunn’s efforts at arriving at an AGI involves figuring out how to incorporate “System 1/2” kind of behavior in AI systems. The hypothesis is that if we figure out how to do it, then AI can truly reason around problems.

One common theme in AI since the beginning, is that AI is always 20 years away. No matter how much progress we make, there will always be more things to do. Will “System 1/2” thinking and encoding physical reality into AI models, give us true AGI? Time will tell.

Posted in Uncategorized | Tagged , | Leave a comment

Moravec’s Paradox: AI for intellectual tasks and AI for physical tasks

I recently learned about Moravec’s Paradox. First framed by Hans Moravec, it is an observation that AI (or computers in general) is good at tasks that we humans consider complex but is bad at tasks that we humans consider very simple.

For example, the more intellectual tasks such as writing, mathematics, creating art, images, music etc. are hard for humans. AI has made the most progress on these tasks, and many of them such as doing complex arithmetic are trivially easy for computers.

Now consider tasks such as simple walking, doing the dishes, folding laundry etc. These are tasks that humans can do without even thinking. However, there are no AI products on the market that can do these tasks for us. I, personally, would much prefer an AI that can do the dishes and clean the house instead of an AI that can write reports or create video.

This difference in progress in AI on tasks that we humans consider difficult versus those we consider easy is the paradox. I’d recommend the wikipedia page and this detailed reddit post for more indepth overview of Moravec’s paradox, including some of the evolution based explanations for this paradox.

So, why is progress on these AI products for physical tasks been so slow?

There are many LLMs on the market and all of them claim to perform very well in these areas. Most of us are familiar with these names: Open AI’s GPT models, Gemini, Claude, Mistral etc. However, the only company that I can think of that makes robots that can move around the natural world like humans is Boston Dynamics and their wonderful robot dancing videos.

One of the arguments that Yann LeCunn has been making is that to get to true Artificial General Intelligence (AGI) is that we need machines that have a innate understanding of the representation of the physical world. The animal world has evolved intelligence by interacting with the physical world and our brains have an intuitive grasp of how the world works. For example, most toddlers figure out that if they throw a ball, it is going to land down. They have an intuitive understanding of gravity and physics. The theory is that the better we figure out how to get AI to understand the physical world, the better they will get at interacting with the natural world. It will be interesting to see the next generation of AI technology and how they tackle these problems.

Posted in Uncategorized | Tagged , | 1 Comment

AI History: The Dartmouth Summer AI Research Project of 1955

As fun as it is to learn about the latest updates in the world of AI, I also find it very interesting to learn about the history of this field. One fascinating project was the “Dartmouth Summer Research Project on Artificial Intelligence”.

This was a research project done in 1955 by four luminaries of the field: John McCarthy, Marvin Minsky, Claude Shannon and Nick Rochester. Reading the proposal, there are two things that stand out to me:

  1. The research project’s scope was visionary in its outlook. Covering everything from programming languages, neural networks, machine learning, algorithms and computational complexity, it set the stage for the direction of computer science and artificial intelligence for the next seven decades.
  2. The sheer audacity of their plan: “We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.” It took more than 70 years to realize the fruits of neural networks.

The proposal is a short paper and I would definitely recommend reading it to get an insight into the minds of the pioneers of the field at the time of this field’s infancy.

Posted in Uncategorized | Tagged , , , , | Leave a comment

Improving reasoning in LLMs using prompt engineering

Getting machines to perform reasoning tasks has long been a cherished goal of AI. These problems include examples such as word problems in mathematics and analytical commonsense reasoning (the kind that you typically see in standardized tests such as SAT/GRE etc.). Today’s large language models (LLMs) can perform many simple reasoning tasks out of the box. There is a growing field of research into how we might improve reasoning in LLMs over complex tasks.

One of the more complex ways of improving LLMs to do better reasoning is via the use of reinforcement learning given the large performance improvements that we have seen by the use of Reinforcement Learning from Human Feedback (RLHF). All of these model enhancements require considerable effort to collect large amounts of training data and choosing right algorithms for reinforcement learning. There is a good discussion on this topic in a paper by Alex Havrilla and team: “Teaching Large Language Models to Reason with Reinforcement Learning“.

I’ll dig into the details into Havrilla’s paper at a later time, but today, I’d like to focus on an easier technique that many people can try out directly. Jason Wei and a team at Google describe a method that can use prompt engineering to get LLMs to perform better reasoning: “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

The intuition behind the paper is pretty straightforward. We humans use step-by-step thinking when trying to solve complex reasoning tasks. Maybe, the models will also get better at reasoning if they are trained on this “step-by-step thinking” process. The team behind the paper experimented with giving LLMs examples of this process and used few-shot learning to get the LLMs to perform better.

A simple example of a prompt used in the few-shot training is:

Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today?
A: There are 15 trees originally. Then there were 21 trees after some more were planted. So there must have been 21 – 15 = 6. The answer is 6.

In the example above the bolded part of the answer is the chain of thought reasoning used to train the model on how to arrive at the correct answer.

This type of prompting could be useful in many complex reasoning natural language tasks. If you have an application where such reasoning could help, this simple change to prompting strategy could help improve your application’s performance.

Posted in Uncategorized | Tagged , , , | Leave a comment

Examining claims of biosecurity risks from Open Foundation Models

One of the primary drivers of regulatory efforts around Generative AI and foundation models is the fear of societal harm from the models. There are many claims, including from some highly respected AI experts, that this technology has the power to cause catastrophic harm to human civilization. When such serious claims are made, it behooves us to understand the perspective from which these claims are made so that we can try to analyze this rationally. Moreover, when such claims are being used to guide govermental regulations, it becomes even more important to be informed and active participants in the process.

In this post, I want to dig into the claim about risks about biosecurity. A really interesting read on this topic is from Anjali Gopal et. al at MIT: “Will releasing the weights of future large language models grant widespread access to pandemic agents?”. The team did a novel test by having two groups separately use two versions of the model to try and figure out all the information needed to recreate the 1918 pandemic virus. The first test group used a standard LLAMA 2 “Base model”. The second group used a “Spicy model” which was essentially a fine tuned model that had its weights modified to bypass all censorship. For example, when prompted to share dangerous information, the “base model” would politely decline but that “spicy model” would not hesitate to share information.

The main critique of this test is that though the “spicy model” group managed to get most of the needed information, they were not doing anything that is already possible without the use of LLMs. As discussed in detail by Sayash et. al, the “marginal risk” of this technology is almost none. The “marginal risk” concept is very useful to keep us grounded when discussing the risks of LLMs.

What would have been more interesting is if the team had instead of doing a 2 cohort test (“Base Model group” / “Spicy model group”) had done a 3 cohort test with the third group trying to get the same information using plain Google search. So essentially a “Base model” vs “Spicy model” vs “Google search” test. I suspect, the “Google search” group would have also been able to find all the information that they needed to complete their test.

What is even more interesting, is that the team suggests an insurance-based process for regulating open foundational models to prevent these types of harms. The liability insurance idea is borrowed from the existing liability laws for nuclear plants. The owners of the nuclear plant are held liable for any damage resulting form their plant, irrespective of who caused the damage. Translating it to the AI world, the people releasing the open foundational model would be liable for any downstream harm caused by using their models, irrespective of who had fine tuned and modified their base model.

How would such regulation be enforced? If some rogue terror group fine tuned an open model, and used it to create a bioterror attack, would they even tell anyone which model they used and how they got their information. I’ll need to dig into some more resources to understand this process.

Posted in Uncategorized | Tagged , , , , | 1 Comment

Examining Benefits & Risks of Open Foundation Models

As new technologies entry the market, we see government regulations come up due to a few reason.

  1. Ensure benefits of the technology are widely distributed
  2. Ensure the risks of the technology are properly managed to reduce impact on society
  3. And the third, though cynical and is also true in many cases, the regulations are pushed by incumbants for regulatory market capture i.e. to make it harder for new companies to challenge them.

The wave of recent AI regulations are largely being positioned as addressing the first two reasons. Although many commentators make the claim that it is also an attempt at regulatory capture by the leading AI companies. For the rest of this post, I’ll stick to open foundational models, as the regulations and the academic research around this topic is focused on open foundational models.

Open models provide many benefits over closed models. The ability for the larger technology community to have access to model weights, and in some cases the model source code and documentation, will spur innovation in adapting these models to unique applications. It will unleash more innovation as a larger pool of people can use these models at low cost.

Since the release of ChatGPT, there have been a host of concerns about Generative AI including concerns of disinformation, manipulating political campaigns, cyber security risks, bio-security risks, automated warfare, etc. Amidst all these claims, the key question is how do we evaluate these threats? Sayash et. al. provide a simple framework, which largely reuses existing security and risk frameworks to evaluate whether open foundational models provide any additional risk (marginal risk) than what is already present.

A few takeaways:

  • With regards to societal benefits of open foundational models, the benefits are largely along the same lines of any other new open technologies: increased access to more people, encourages competition and innovation, and reduces concentration of value.
  • With regards to risks, the encouraging sign is that we have extremely good frameworks from the general study of security and risks. Specifically with respect to computers, the field of cybersecurity has much to offer in how we manage these risks. Instead of readily succumbing to the FUD scenarios, we can rationally assess the marginal risk of these models using these risk assessment frameworks. As shown in the example of biosecurity hazard, many of these risks have been around for a long time and open LLM models are not increasing the risk of attacks.

Posted in Uncategorized | Tagged , , , | 1 Comment

Open Models: The focus of AI Governance

Every new technology ushers in both excitement and concerns. The latest innovations in AI with the introduction of large language models is no exception. The quantum leap in progress demonstrated by these models caught most people, even those working in the field of AI, by surprise. Many tasks, such as those requiring human mental skills, and previously thought to be very difficult to be performed by machines are now easily done by these Generative AI models. The well publicized examples include ChatGPT clearing the bar exams and passing the MCAT as well.

Governments across the globe have rushed in to figure out the impacts of this new Generative AI technology on society. The US and European Union have already issued regulatory frameworks that seek to control the impact of this technology, with the hope of maximizing benefits and minimizing harm to society.

At the heart of these regulations, there is a particular focus on “open foundational models”. It is important to note that “open models” are not the same as “open source models”. The technology community is now very familiar with open source software, where an application’s entire code base is made public, and provided along with documentation and datasets where applicable. With respect to GenAI and LLMs, open models are more important to the community than open-source models.

A recent academic paper, “On the Societal Impact of Open Foundation Models”, does a great job of laying the groundwork for defining the term “open models”. It defines 5 important criteria that make a model “open” (and not “open source”):

  1. The weights of the model should be made public.
  2. The model source code and the data used to train the model need not be made public.
  3. The model should be widely available
  4. The model need not be released in stages.
  5. The model may have use restrictions.

Open models are considered to pose more of a threat to society because a closed model could in theory block access to malicious users and limit the potential harm. Once the models are released in the open, no one has any control over them. There is no central authority that can control what an individual user does with the model once they get access to it. Hence the major focus of the regulatory frameworks is focused on open models.

Posted in Uncategorized | Tagged , , , | Leave a comment

New Data on AI Policy and Governance

The Stanford University: Human-Centered Artificial Intelligence center’s 2024 AI Index has been released. It collects and shares highly relevant data to the broad field of AI, covering topics ranging from latest investments in AI research and development, AI in fields such as science, medicine and education, as well as latest updates in regulation throughout the world.

Two topics that I am particularly interested in are “Responsible AI” and “Policy and Governance”. These are two topics that I intend to deep dive in and share my learning in this space over the next few months. To begin with, here is some interesting data regarding “Policy and Governance” that the report highlights:

  • In 2023, the US had 25 AI related regulations
  • 21 regulatory agencies issued AI-related regulations, including those such as the Department of Transportation and the Occupational Safety and Health Administration.

Apart from these regulations, there were two big regulatory framework announcements that captured a lot of attention over the last few months:

Apart from the US and Europe, there’s been plenty of interest across the globe on AI regulation. This is a nascent field and I expect a lot more activity in the near future.

I intend to dig into the AI policy and regulation space a lot more in the coming months. It is fascinating because it overlaps so many complex and nebulous topics: regulation vs free market, objective evaluation of AI models, figuring out frameworks for measuring fairness and responsibility etc.

Posted in Uncategorized | Tagged , , , | Leave a comment

A simple taxonomy to guide your LLM prompt engineering

Large language models (LLM) have truly democratized AI development. Software engineers can now develop many AI applications without the need for dedicated model development. These LLMs are a one-stop shop for performing a variety of natural language processing (NLP) tasks. These tasks, such as key information extraction or named entity recognition, would in the past have required dedicated NLP models. Now, with LLMs, developers can query them using natural language and the models respond with the results of the tasks. These queries, known as “prompts”, have led to a whole host of “prompt engineering” best practices.

One intriguing part of LLM behavior, is that the model’s outputs can vary based on how the input prompt is structured. Variations in input prompts leads to variations in output. This fluctuation in LLM output can strongly impact an application’s performance metrics. In order to achieve optimal performance, engineers need to fine tune the prompts given to the LLM.

I recently came across this very useful taxonomy of prompts that can serve as a simple guide to engineers that are building generative AI applications using LLMs. This is a paper titled “TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks”.

While the motivation of the paper was to help benchmarking test suites get a standard benchmark for evaluating LLMs, the framework offered there is helpful to engineers as well. The 7 level taxonomy goes all the way from giving very little directive to the LLM to specifying lots of details in the prompt on what the user expects in the output.

Check out the paper and use the taxonomy for your next project.

Posted in Uncategorized | Tagged , , , | Leave a comment