Overview
Lately I’ve been interested in writing algorithms (agents) that interact with websites. The game Cookie Clicker is a great testing ground for such algorithms. The game is played by clicking a big cooke to earn money (cookies) that can be used to invest in instruments (buy buildings), which in turn generate revenue (more cookies). Here I’ll describe three agents that I made for this game and assess their performance. I also describe many of the helper functions involved in implementing these agents. The code that implements these agents and reproduces all figures in this blog post can be found on my GitHub, here: https://github.com/MacStrelioff/CookieClickerAgent
Background and Setup
Agents
While the goal was to earn as many cookies as possible, the game also allows revenue to accrue while the player is away, and the amount of cookies earned per click can scale with revenue. For these reasons, I focused on maximizing revenue as the overall goal for any strategy.
Naive: Buy all affordable investments
The simplest investment strategy was to purchase any affordable building. This is implemented in the code below:
class agent_class_naive:
...
def buy_products(self):
products = driver.find_elements_by_xpath('//div[@class="product unlocked enabled"]')
while products: # if there are affordable products, buy them
products[-1].click()
products = driver.find_elements_by_xpath('//div[@class="product unlocked enabled"]')
...
Here products
is a list of the web elements that represent the affordable buildings, and the while loop cycles over them, buying the most expensive ones first with products[-1].click()
, until no more buildings are affordable. This purchasing logic is implemented through a buy_products
method of the naive agent, agent_class_naive
. In my code, the naive agent class also contained the necessary helper functions, described in the section above.
MaxROI: Buy best return on investment
The second strategy is to buy the option that will have the best return on investment (revenue to price ratio). The code below extends the naive agent, agent_class_naive
, by overriding the buy_products
method that implements the investment strategy.
class agent_class_max_rps_price_ratio(agent_class_naive):
# overwrite the buy_products method
def buy_products(self):
## update building info
self.get_building_info()
# while best is affordable, buy the best rps/price building
best_building_affordable = True
while best_building_affordable:
## get unlocked products
products = (driver.find_elements_by_xpath('//div[@class="product unlocked enabled"]') +
driver.find_elements_by_xpath('//div[@class="product unlocked disabled"]'))
# find max rps/price building
max_rps_pp,building_to_buy,product_to_buy = 0,[],[]
for i,building in enumerate(self.building_info):
# get rps/price for building
cur_rps_pp = self.building_info[building]['cps/price']
# if it's the best so far, update max and building id
if cur_rps_pp > max_rps_pp:
max_rps_pp,building_to_buy = cur_rps_pp,building
product_to_buy = products[i] # store element to click
# update balance
self.log_balance_and_revenue()
# check if best building is affordable.
if self.building_info[building_to_buy]['price']<=self.balance:
# buy building_to_buy (click on this product)
product_to_buy.click()
# update building info (including rps per price rps_pp)
self.get_building_info()
else: best_building_affordable=False # if not affordable, break the loop
First the MaxROI agent updates building information, including; cookies per second, price, and the ratio cps/price, using the helper method get_building_info()
. Then it purchases the buildings that provide the max ROI, until the best building by this metric is unaffordable. The search for the best building is implemented in the section:
# find max rps/price building
max_rps_pp,building_to_buy,product_to_buy = 0,[],[]
for i,building in enumerate(self.building_info):
# get rps/price for building
cur_rps_pp = self.building_info[building]['cps/price']
# if it's the best so far, update max and building id
if cur_rps_pp > max_rps_pp:
max_rps_pp,building_to_buy = cur_rps_pp,building
product_to_buy = products[i] # store element to click
This is an \(O(N)\) search through the buildings that have information, where \(N\) is the number of buildings. This code logs the maximum ROI in max_rps_pp
and the building associated with this ROI in building_to_buy
, as well as the web element to click in order to purchase this building, product_to_buy
. After the best ROI investment is found, the next few lines of code update the agent’s balance and check if the investment is affordable. If it is, the agent buys it and this process repeats, if it is not then best_building_affordable
is set to False
which ends the while loop.
MinWait: Buy what minimizes the time to the highest revenue purchase
The intuition of this strategy is to buy the investment that maximizes revenue, however, those investments can be expensive. Other investments may increase revenue enough to decrease the amount of time before the highest revenue investment can be purchased. This intuition is sketched in the figure below, where the naive waiter (red) waits until they can purchase a hypothetical maximum revenue investment for 200, and the MinWait algorithm makes an investment for 100 that increases revenue enough to purchase the maximum revenue investment faster than the naive waiter:
Formally, this algorithm first computes the wait until the maximum revenue investment can be purchased (\(w_{max}\)), then searches for investments that can reduce the wait and purchases any that it finds. For a given current revenue \(r_{t}\) and cost of the maximum revenue investment, \(c_{max}\), the wait until this investment can be purchased is \(w_{max}=\frac{c_{max}}{r_t}\), which corresponds to the time when the red line reaches 200. This formula ignores current balance, but in the appendix I show that the current balance is irrelevant for the purchasing decision. An alternative investment, \(b\), could improve the wait time if it’s addition to revenue is large enough to make up for its cost before the time \(w_{max}\). This is shown with the blue dashed line in the example above. If investment \(b\) adds \(r_b\) to the current revenue \(r_t\), and costs \(c_b\), then the wait until the maximum revenue investment if \(b\) is purchased is: \(w_b = \frac{c_b}{r_t}+\frac{c_a}{r_t+r_b}\). This algorithm buys \(b\) if \(w_b<w_{max}\); that is, if purchasing \(b\) reduces the wait until the maximum revenue investment can be purchased.
The code I used to implement this is provided below.
# Max RPS/price agent
class agent_class_min_wait(agent_class_naive):
# overwrite the buy_products method for this agent's purchase logic
def buy_products(self):
## update building info
self.get_building_info()
# while best building affordable, buy it and look for next best building
best_building_affordable = True
while best_building_affordable:
## get unlocked products
products = (driver.find_elements_by_xpath('//div[@class="product unlocked enabled"]') +
driver.find_elements_by_xpath('//div[@class="product unlocked disabled"]'))
# find building with max revenue per second and it's cost
max_rps,cost_max_rps = 0, float('inf')
building_to_buy,product_to_buy = [], []
for i,building in enumerate(self.building_info):
# get rps for building
cur_rps = self.building_info[building]['cps']
# if it's the best rps far, update max, cost, and building id
if cur_rps > max_rps:
max_rps,building_to_buy = cur_rps,building
cost_max_rps = self.building_info[building]['price']
product_to_buy = products[i] # queue this building to buy
# update revenue for computations below
self.log_balance_and_revenue()
# check if any other purchase would reduce wait time to buying max_rps product
wait_max = float(cost_max_rps) / self.revenue if self.revenue else 0 # stops division by 0
for i,building in enumerate(self.building_info):
cost_cur = self.building_info[building]['price']
rps_cur = self.revenue + self.building_info[building]['cps']
# conditional to stop division by 0
wait_till_cur = float(cost_cur) / self.revenue if self.revenue else 0
wait_cur = (wait_till_cur +
cost_max_rps / rps_cur)
if wait_cur <= wait_max:
wait_max = wait_cur # update minimum wait
building_to_buy = building
product_to_buy = products[i] # queue this building to buy instead
# update balance for checking if building affordable
self.log_balance_and_revenue()
# buy either max_rps product, or the building that would reduce wait time
# check if best building is affordable
if self.building_info[building_to_buy]['price']<=self.balance:
# buy building_to_buy (click on this product)
product_to_buy.click()
# update building info (including rps per price rps_pp)
self.get_building_info()
else: best_building_affordable=False # if not affordable, break purchase loop
This implementation again extends the agent_class_naive
class by replacing the buy_products
method. The algorithm makes two passes through the list of unlocked investments. The fist pass is copied below:
# find building with max revenue per second and it's cost
max_rps,cost_max_rps = 0, float('inf')
building_to_buy,product_to_buy = [], []
for i,building in enumerate(self.building_info):
# get rps for building
cur_rps = self.building_info[building]['cps']
# if it's the best rps far, update max, cost, and building id
if cur_rps > max_rps:
max_rps,building_to_buy = cur_rps,building
cost_max_rps = self.building_info[building]['price']
product_to_buy = products[i] # queue this building to buy
Similar to that used in the MaxROI agent, this code is an \(O(N)\) search through the \(N\) unlocked buildings. It logs the maximum revenue in max_rps
and the building associated with this revenue in building_to_buy
, as well as the web element to click in order to purchase this building, product_to_buy
.
Then the agent computes \(w_{max}\) with:
wait_max = float(cost_max_rps) / self.revenue if self.revenue else 0 # stops division by 0
The (value) if (condition) else 0
is used to stop division by 0 – if self.revenue
is 0, then wait_max will take the value 0 rather than the computed value which is undefined when self.revenue
is 0.
Next, another pass through the list of buildings, this time searching for a building that will reduce the wait time:
for i,building in enumerate(self.building_info):
cost_cur = self.building_info[building]['price']
rps_cur = self.revenue + self.building_info[building]['cps']
# conditional to stop division by 0
wait_till_cur = float(cost_cur) / self.revenue if self.revenue else 0
wait_cur = (wait_till_cur +
cost_max_rps / rps_cur)
if wait_cur <= wait_max:
wait_max = wait_cur # update minimum wait
building_to_buy = building
product_to_buy = products[i] # queue this building to buy instead
Here the MinWait agent gets the cost of a candidate investment, cost_cur
and the revenue that would be attained if this investment is bought rps_cur
. Then computes the time before this candidate investment could be bought at the current revenue, wait_till_cur
and finally computes \(w_b\), the wait until the maximum revenue option could be bought if the candidate investment is bought first as wait_cur
. Lastly, if this is lower than wait_max
(\(w_{max}\)), then the agent stores the best wait in wait_max
and updates the building to buy and the web element to click.
Like the MaxROI agent, the last few lines of code check if the investment is affordable, and if not, terminates the ends the purchasing loop.
Performance
Each agent class was instantiated as agent
and run for 100,000 big cookie clicks, with the purchase logic run after every 200 clicks. The Naive, MaxROI, and MinWait agents ran for approximately 3650 seconds, 3785 seconds, and 3800 seconds, respectively. The differences in runtime were small enough that I wasn’t worried about differences in the performance metrics below being attributable to extra income earned from a longer runtime.
Revenue
First I looked at the main metric, revenue per second, shown in the figure below:
The MaxROI and MinWait strategies clearly performed better than the Naive strategy in terms of maximizing revenue per second. It also appeared that the MaxROI algorithm generally had higher revenue when in a purchasing cycle, but the MinWait algorithm surpassed the MaxROI algorithm when waiting for a large purchase.
Building Count
The Naive agent purchases each investment at about the same rate. At the other extreme, MinWait agent generally saves for the highest revenue option, then occasionally buys cheaper options to reduce the wait until the next purchase of the highest revenue option. This is most clearly seen in the spikes across buildings soon after a new, expensive, building is purchased. Like the Naive agent, the MaxROI agent also buys each investment frequently, however this agent prioritizes the investments that give the best return on investment.
Building Prices
The Naieve strategy results in equalizing prices, while both the MaxROI and MinWait strategies can save to purchase the most expensive investment.
Return on Investment (Revenue Per Cost)
The Naive strategy rarely chooses the best ROI option. The MaxROI strategy equalizes the ratio between revenue and price across the investments, while the MinWait strategy similarly picks options with a good ratio here but also frequently chooses worse deals. Overall the Naive and MinWait strategies often end up paying more than they should for revenue.
Conclusions
Both the MaxROI and MinWait agents perform far better than the Naive agent in terms of maximizing revenue. The MaxROI agent performed better when in a buying cycle, but the MinWait agent surpassed the MaxROI agent during long periods of saving for an expensive purchase. Perhaps a better algorithm would be a hybrid that follows the MaxROI strategy when all revenues are known, and uses the MinWait strategy when saving up for an expensive option with an unknown revenue.
Appendix:
MinWait Decision is Independent of Balance
Another consideration for the MinWait strategy was that the balance could reduce wait overall, and this might mathematically result in a preference reversal if the wait for \(b\) becomes negligably small. Accounting for balance and using \(c_{max}\) to represent the cost of the maximum revenue option, \(r_t\) to represent the current revenue, \(c_b\) to represent the cost of another investment, and \(r_b\) to be the additional revenue provided by that investment, the wait time calculations become;
\[ \begin{aligned} w_{max} &= \frac{c_{max}-balance}{r_t} \\ w_{b} &= \frac{c_b-balance}{r_t} + \frac{c_{max}}{r_t+r_b} \end{aligned} \]
The the decision rule can be cast as: buy \(b\) if \(w_{max} - w_{b}<0\). This results in the decision function:
\[ \begin{aligned} w_{max} - w_{b} & = \frac{c_{max}-balance}{r_t} - \left(\frac{c_b-balance}{r_t} + \frac{c_{max}}{r_t+r_b}\right) \\ &= \frac{c_{max}-balance-c_b+balance}{r_t} + \frac{c_{max}}{r_t+r_b} \\ &= \frac{c_{max}-c_b}{r_t} + \frac{c_{max}}{r_t+r_b} \\ \end{aligned} \]
The same function results from \(w_{max}=\frac{c_{max}}{r_t}\) and \(w_{b} = \frac{c_b}{r_t} + \frac{c_{max}}{r_t+r_b}\), hence the decision does not depend on balance.
Helper Functions
The base agent class, agent_class_naive
, contained many helper functions as well as the naive investment purchasing logic. This section explains the code used for the helper functions.
Browser and Game Initialization
The agent was initialized with a web driver object stored in driver
. The first part of the agent class __init__
method is copied below:
class agent_class_naive:
def __init__(self,driver):
# navigate to site
driver.get('https://orteil.dashnet.org/cookieclicker/')
time.sleep(10) # time for page to load
self.big_cookie = driver.find_element_by_id('bigCookie')
self.golden_cookie_clicks = 0
# initialize balance and revenue variables
self.balance=0
self.revenue=0
# initialize balance and revenue logs
self.log_balance=[0]
self.log_revenue=[0]
self.log_bal_rev_epoch=[0]
# initialize building info, and logs of building info
self.get_building_info()
self.log_build_info = dict()
self.building_info_logger(epoch=0)
...
First the agent navigates the webdriver to the Cookie Clicker URL and pauses for 10 seconds to let the page load. After that, the agent finds the big cookie web element and assigns it to attribute big_cookie
. The remaining lines above initialize various attributes that were either used in purchasing logic, or used to log data on agent performance.
The second part of the __init__
method navigated to the ‘options’ tab and changed the game settings. One important setting to change was the numbersButton
option – this toggled between displaying numbers numerically, ‘1,000,000’, versus with words, ‘1 million’. Since the balance is scraped from this text, the agents require that numbers be displayed numerically. The other changed settings related to graphics and game performance. Code for this second part of __init__
is shown below:
# CHANGE GRAPHICS AND OTHER SETTINGS
# click 'options' tab
driver.find_element_by_id("prefsButton").click()
time.sleep(1) # let page load
# disable text instead of numbers, e.g. 'million' -> '000,000'
driver.find_element_by_id("numbersButton").click()
# slide volume to around 25%
time.sleep(.5) # so actions don't happen too fast
volume = driver.find_element_by_class_name("slider")
move = ActionChains(driver)
move.click_and_hold(volume).move_by_offset(-50, 0).release().perform()
# change graphics for optimal performance
buttons = ("fancyButton","particlesButton","cursorsButton",
"milkButton","wobblyButton","cookiesoundButton",
"formatButton","extraButtonsButton","customGrandmasButton")
for button in buttons:
time.sleep(.5) # so actions don't happen too fast to execute
driver.find_element_by_id(button).click()
Purchasing Upgrades
Upgrades are another way to spend cookies. The function below finds affordable upgrades and purchases (click) them, starting with the most expensive avilable upgrade.
def buy_upgrades(self):
upgrades = driver.find_elements_by_xpath('//div[@class="crate upgrade enabled"]')
while len(upgrades)>0: # if there are products, and while we can afford products,
try: upgrades[-1].click() # buy each one, most expensive first
except: None
upgrades = driver.find_elements_by_xpath('//div[@class="crate upgrade enabled"]')
Logging balance and revenue
Balance and revenue were important for some strategies and were logged as a metric to compare the agents. To facilitate both use cases, I created a method that took an argument epoch
. This method parses the text above the big cookie to convert a string like ‘177 cookies \n per second: 1.1’ to extract the balance of 177 and revenue of 1.1. If the method is called without an epoch
, then it only updates the balance and revenue. Alternatively, if an epoch
is passed, then this method also logs the balance, revenue, and eopch or click number. The code for this is shown below:
def log_balance_and_revenue(self,epoch=None):
# parse string with balance and revenue
tmp = driver.find_elements_by_xpath('//div[@id="cookies"]')
tmp = tmp[0].text.replace(',','') # remove camas 1,000 -> 1000
tmp = re.findall("\d+",tmp) # extract balance and revenue
tmp = [int(i) for i in tmp] # convert str -> int
# update balance and revenue
self.balance = tmp[0] # update current balance
self.revenue = tmp[1] # update revenue
# if epoch passed, log balance and revenue at this epoch
if epoch:
self.log_bal_rev_epoch.append(epoch) # index for balance and revenue
self.log_balance.append(tmp[0]) # log balance
self.log_revenue.append(tmp[1]) # log revenue
Get building information
Building information like the cost and revenue was used in purchasing logic. The code used to extract this is shown below, and each component is elaborated on afterwards:
def get_building_info(self):
# get unlocked products
products_unlocked = (driver.find_elements_by_xpath('//div[@class="product unlocked enabled"]') +
driver.find_elements_by_xpath('//div[@class="product unlocked disabled"]'))
# info for unlocked buildings
building_info = dict()
for i,building in enumerate(products_unlocked):
# get info from building button
tmp=building.text.replace(',','').split(sep="\n")
building_name,building_price,building_count = tmp if len(tmp)==3 else tmp+[0]
# initialize dict for building
building_info[building_name] = dict()
# fill in count and price
building_info[building_name]['count']=int(building_count)
building_info[building_name]['price']=int(building_price)
# get info from building tooltip
hover = ActionChains(driver).move_to_element(building)
hover.perform()
tooltip = driver.find_elements_by_xpath('//div[@id="tooltip"]')
tmp=tooltip[0].text.replace(',','');
tmp_cps = re.findall(r"produces [-+]?\d*\.\d+|produces \d+",tmp);
# > 'produces X'
if tmp_cps: # if building has been purchased
tmp_cps_2=float(re.findall(r"[-+]?\d*\.\d+|\d+",tmp_cps[0])[0])
building_cps = tmp_cps_2
else: # if building hasn't been purchased, store cps as infinity
building_cps = float('inf')
# > X
#### default of inf below encourages purchasing unlocked, unowned buildings
#building_cps = float(tmp_cps[0][:tmp_cps[0].find(' ')]) if tmp_cps else float('inf')
building_info[building_name]['cps']=building_cps
building_info[building_name]['cps/price']=building_cps/int(building_price)
self.building_info = building_info
This code first obtains a list of the buildings that are unlocked, products_unlocked
, then iterates over these to extract the relevant information: count or number owned, price of the next building, revenue, and the revenue to price ratio. Count and price could be obtained by parsing the text on the building button. This was done in the the following lines:
# get info from building button
tmp=building.text.replace(',','').split(sep="\n")
building_name,building_price,building_count = tmp if len(tmp)==3 else tmp+[0]
# initialize dict for building
building_info[building_name] = dict()
# fill in count and price
building_info[building_name]['count']=int(building_count)
building_info[building_name]['price']=int(building_price)
Revenue could only be obtained from a tooltip that appeared when the mouse hovered over the building button. The code below effects a mouse hover over the building, then extracts and parses the text from the building tooltip;
# get info from building tooltip
hover = ActionChains(driver).move_to_element(building)
hover.perform()
tooltip = driver.find_elements_by_xpath('//div[@id="tooltip"]')
tmp=tooltip[0].text.replace(',','');
tmp_cps = re.findall(r"produces [-+]?\d*\.\d+|produces \d+",tmp);
if tmp_cps: # if building has been purchased
tmp_cps_2=float(re.findall(r"[-+]?\d*\.\d+|\d+",tmp_cps[0])[0])
building_cps = tmp_cps_2
else: # if building hasn't been purchased, store cps as infinity
building_cps = float('inf')
One challenge was that the revenue was only viewable for owned buildings. To address this, I assumed a revenue of infinity for unowned buildings. This would incentivise agents that bought based on revenue to save and purchase the new building – which was reasonable because these buildings had higher revenue than the owned buildings. The last few lines in the code chunk above extract the revenue for owned buildings, and impute a revenue of infinity for unlocked but unowned buildings.
Finally, the last few lines of code log the revenue and the return on investment (revenue divided by price) for each building and store the updated building information.
Since this method was long already, I used a seperate method when logging this information. This logging function is shown below:
def building_info_logger(self,epoch):
if self.building_info: # if self.building_info exists
# modified from get_building_info to append.
log_keys = ['count','price','cps','cps/price']
# info for unlocked buildings
for building in self.building_info:
# initialize new buildings
if building not in self.log_build_info:
self.log_build_info[building] = dict()
for k in log_keys+['epoch']:
self.log_build_info[building][k]=[]
# fill in count,price,cps, cps/price
for k in log_keys:
self.log_build_info[building][k].append(self.building_info[building][k])
# log epoch, within each building since buildings become available at different times
self.log_build_info[building]['epoch'].append(epoch)