Friendly Fire: How Balancer’s Openness Led to a Double Breach

5 October, 2023
article image
Case Study

Contents

Summary

Attack date: Aug 27, 2023 | Sep 20, 2023 Issue type:

  • Inability to pause old-versioned and pause window-finished pools
  • Vulnerable logic of rounding calculation
  • Pool-draining tx using flashloan
  • Data mining (analysis of vulnerable pools list)

Exploited pools: full list Affected networks: Ethereum, Polygon, Arbitrum, Optimism, Avalanche, Gnosis, Fantom, zkEVM Estimative loss: ~$893,978

📝Introduction

In this article, we aim to discuss how one of the largest AMM protocols, Balancer, was twice hacked within a month. We will analyze the team’s responses to identify the mistakes made and attempt to glean insights from this situation to prevent the recurrence of such tragic events in the future.

To begin with, let’s briefly review the basic facts and understand what exactly went wrong with Balancer on August 27.

It all started with the publication of a very suspicious and utterly non-explanatory announcement on the Governance forum, followed by a post on Balancer’s official Twitter account on August 22. The users were warned that their funds were at risk.

Surprisingly, everything had been planned in advance. The protocol team was already well aware of what had happened to their pools. Publishing a list of vulnerable pools, which at first glance appeared suicidal, was also a premeditated step in a long series of actions intended to protect user funds. Understandably, this sparked panic within the community.

Events did not reach any logical conclusion until August 27, when several hack-spotting services detected a suspicious transaction indicating a protocol breach. This was a significant issue since, by that time, not all the pools listed on the forum had been drained. One such pool was specifically compromised by several flash-loan transactions.

To understand this intriguing detective story, let’s go back to June 23 and trace how the situation escalated within the protocol and what logic lay behind such peculiar actions! 📽️🍿🎬

Sequence of Events

  1. On June 23, the team received the first white hat report via Immunefi, indicating the presence of a LOW-level vulnerability in the precision and rounding calculation mechanism.
  2. Within a couple of weeks, it was fully mitigated; the team was ready to release a post-mortem and report to the community.
  3. However, just before the post-mortem publication, the team received a second white hat report via Immunefi, which escalated the same vulnerability to a CRITICAL level.
  4. After studying the vulnerability, the team was horrified: the problem was so fundamental that 20% of the Total Value Locked (TVL) was at risk at that time.
  5. Realizing the gravity of the situation, the team, in close collaboration with white hats and under strict confidentiality, began planning an operation to save user funds.
  6. Thanks to the built-in pool management mechanisms (and, spoiler, pure coincidence), over 90% of the vulnerable ~$242M was salvaged. However, some pools were beyond saving.
  7. The team decided they had done everything within their power and simply released a list of vulnerable pools, leaving it to the users to save the remaining funds. They hoped that users would withdraw their money faster than hackers could exploit the vulnerability (spoiler, this was very naive).
  8. The hack on August 27 made it clear that the vulnerability described in the second white hat report had not been fully explored, and the pools were still susceptible to manipulations, not through one but both primary methods of interaction with them: “GivenOut,” and “GivenIn”.
  9. For a month after the hack, the developers’ focus remained on finding ways to fix the vulnerability, but on September 20, a second hack occurred, this time through the UI.

Consequences of the Expoit

As a result of the primary on-chain hack, at least ~$893,978 was stolen from 8 networks. The second hack resulted in the balances of several dozen users being nullified.

The most horrifying aspect of both hacks is that, for the first time in the project’s history, the funds of direct users of the project were stolen twice.

So now, we have an understanding that Balancer was breached, and that funds of end users were stolen from the contract for the first time (Balancer had been hacked before, but either the investments of regular users were not at risk, or they were successfully recovered). Let’s now explore how it happened that Balancer, one of the largest AMM protocols in the industry, having undergone numerous audits, proved to be vulnerable?

🤹‍♂️Why Balancer was Vulnerable?

Before we delve into the essence of the hack, let’s briefly understand how Balancer’s pools operate, omitting all the details not necessary for understanding the hack.

Balancer Pools Overlook

Balancer has a plethora of pool types, but in the story we are examining, mainly 2 types were involved:

  • Linear Pool) Like other AMM platforms, Balancer creates liquidity pools, allowing the exchange of base (unwrapped) tokens for wrapped tokens. The base tokens include various stablecoins and others, and the wrapped versions possess several important characteristics which will be discussed later. Balancer’s Vault doesn’t support unwrapped project tokens; thus, they are initially wrapped via the lending platform before being used in trading. Naturally, wrapping tokens with each trade incurs additional costs, escalating transaction expenses, especially with the rise in transaction volume, hence the cost of each operation can significantly increase. In this type of pool, there are no minimum balance amounts and swap fees; they serve as building blocks for constructing more complex types of pools.
  • Composable Stable Pool) Since using wrapped tokens is not always convenient and is somewhat disadvantageous due to the wrapping occurring for an additional fee through the lending platform, Balancer created pools that have their own tokens. With these tokens, during a Batch Swap, different tokens from internal linear pools can be directly swapped. To illustrate how this type of pool operates, consider the diagram below, demonstrating what would occur if we wish to exchange USDT for DAI.

Pools Pausability

Balancer has a feature that allows, in case of unforeseen situations, to simply halt LP by activating the Pause Window, which will allow all users to safely withdraw their funds from the pool. The pause can be enabled no later than three months after the contract is deployed. This pause can be undone, allowing governance not to ponder long over its implementation, as in case of a false alarm, it can always be disabled, and the pool will resume normal operation. To prevent censorship, the Pause Window has its own “expiry date,” ensuring that no single agent has the ability to perpetually block any given pool.

One of the problems was that this security system was originally developed for basic pools, and for Composable pools, the pause function only appeared in V5, which also plays a significant role in understanding how the entire hack occurred.

Swap, Precision and Rounding Calculation

Essentially, there were two bugs in the protocol, but the team only learned about one of them. Both involved artificial manipulation of the exchange rate through a vulnerability in rounding mathematics during swaps. The difference between them was only in which direction the rate was being manipulated.

To calculate the rate at which the swap occurs, it is necessary to interpret the ratio of wrapped tokens relative to unwrapped tokens. A special Linear Math is used for this purpose.

The Linear Math in Balancer worked correctly and, by design, was meant to prevent unforeseen changes in the pool’s state due to manipulation with the rounding operation, yet still allowing opportunities for fair arbitrage, which enhances the pool’s efficiency.

So, now that we have all the necessary knowledge to understand the hack itself, let’s take a closer look at what actually happened.

👾First hack landscape

In this section, we will examine the vulnerability exploited by the malefactors and try to understand its essence. This section is very important for understanding the whole picture, so please read carefully!🔍

However, it turned out that the rounding in Linear Math was vulnerable, and the protocol could be hacked following a certain scenario:

  1. “Borrow” BPT (flash swap) at a rate > 1, and trade it for main and wrapped to reduce token balances to near zero.
  2. Craft a trade that exploits the rounding error on GivenOut swaps to make the total balance equal the virtual supply, which resets the rate to 1 (since rate = balance/supply).
  3. Repay the flash-swapped BPT at the new lower rate for a profit.

Moreover, it was already impossible to apply the Pause window function to the affected Linear Pools due to their having been deployed too long ago, meaning they couldn’t be paused at the time of the hack. Consequently, realizing the vulnerability, the project team paused all the new pools they could reach, but there were also many pools that couldn’t be paused as mentioned above.

After this whole saga of pausing pools, the protocol team announced the presence of a critical issue, thinking that most of the funds were saved, and that a mere list of vulnerable pools would be insufficient for black hats.

However, black hats still managed to steal ~$900.000 and, what was worse, they opened a second vector of attack that the team was unaware of, meaning the possibility to break the precision calculation logic in the opposite direction, not lowering but instead raising the rate. The team didn’t anticipate such a turn of events!

Great, we’ve grasped the main essence of the vulnerability in the contract, but on September 20, there was also a second hack on the frontend side! Are these two hacks related? Which of the hacks was more extensive? How did the project team allow this to happen? We’ll delve into this in the next section! 🔜

👾How the second hack was performed?

In this section, we will examine the essence of the second frontend hack and analyze its interrelation with the first hack.

Not only were the pools hacked on August 27, but on September 20, the main site of Balancer was also breached. The addresses of the main contracts were altered in such a way that the only thing that really happened when interacting with them was the zeroing of the victim’s balance.

Balancer still does not understand how this could have happened, but this can be explained by the simple inability of the team to monitor all the vitally important modules of the protocol’s architecture at a critical moment.

Excellent, now we understand what really happened during this tough month for the Balancer protocol’s team and community. However, no hack happens by itself! Every hack has its cause and effect. What were the causes of these hacks? Let’s figure it out.

🔍What was made wrong?

In this section, as solidity auditors, we would like to propose our version of what mistakes were made and what should never have been done.

Committed Mistakes

  • New versions of pools at the release stage were not adequately audited. When it comes to security, it is always worth rechecking all hypotheses to the end, as any of them could be the one leading to a hack!
  • A Pause Window of a couple of months is too optimistic a timeframe, during which it is often impossible to comprehensively study the problem and release a full-fledged update. In the worst-case scenario (the mitigation of which must be foreseen), it may take from six months to a year.
  • Even though the team believed they had done everything in their power, simply publishing a list of vulnerable pools was a very bad idea. And it was very naïve to assume that hackers would not be able to analyze the public code and correlate what connects all these vulnerable pools.
  • The vulnerability that the white hats informed the team about was not fully studied. The team only learned about this post-factum, on August 27th. Partly, this can be explained by the focus on saving assets, but one should always assume the worst.

Lessons learned

  • Architecture Design Stage | Essentially, Balancer was saved by luck. As the team itself says, the ability to pause V5 pools was purely coincidental. This suggests that at the design stage, there was clearly a lack of interaction with security consultants, which resulted in an initially low ability to operate under critical conditions.
  • Logic Complexity | When you design relatively simple types of pools, it’s okay. But when you decide to create a pool, each implementation of which will be a small exchange-storage-broker, the complexity of which will be off the charts — that’s not okay. Balancer’s architecture is inherently complex at all stages: it has complicated logic, many component parts, and many external composite calls. In such an architecture, even with very careful refactoring, it’s impossible to cover absolutely all cases and attack vectors with attention; sooner or later, something bad is bound to happen.
  • Responsible Audit Preparation | It is always important to remember that attempts to save time/money on security always end the same way, and once again, we urge to pay close attention to the fact that trying to confine auditors within strict frameworks proportionally affects the quality of the audit!

Alright, now we have a comprehensive picture of what factors could have triggered such unfortunate events. Let’s now summarize everything we have reviewed and draw conclusions.

📝Conclusion

So, what conclusions can be drawn from this situation?

  1. When conducting an audit of your project, do not neglect to allocate additional time for a more meticulous audit of all the updates you are deploying into production.
  2. If you are a technical specialist, make sure to convey to your management the idea that intentionally reducing the Audit scope or pressuring the timelines can have devastating consequences!
  3. Sometimes the problems found in the protocol can be more fundamental; do not neglect to spend extra time studying them thoroughly, and always take into account even the worst possible cases.

Learn from the mistakes of other projects and remember that the security of the protocol is as important as its innovativeness!

🗃️References

[1] — “Rate manipulation in Balancer Boosted Pools — technical postmortem” | medium article [2] — “Composable Stable Pools” | Balancer docs [3] — “Linear Pools” | Balancer docs [4] — “Linear Math” | Balancer docs [5] — “Stable Math” | Balancer docs

Telegram
Case Study

Contents

Telegram

Have a question?

Have a question?

Stay Connected with OXORIO

We're here to help and guide you through any inquiries you might have about blockchain security and audits.