Some thoughts about life and game theory

Some thoughts about life and game theory
Photo by Jonathan Petersson

I used to find it quite interesting that most TV shows and movies have a positive ending, one that usually required some sort of collaboration, cooperation, or altruistic sacrifice (typically called a redemption arc) to actually succeed and go above and beyond. I used to think this was especially not true because of the learnings I had in the past, and because of what I was taught - to become first or second necessarily means that everyone else or most people have to perform worse than you, consequently whether you perform better or worse is mostly a relativistic way of observing events, but it can be quantified into that singular statement, whether it means you did better or they did worse is primarily just framing.

I also found it quite interesting that most of the popular TV shows on this front come from Asian countries, Squid Game, Alice in Borderland being prime-time examples of some popular ones. And Asian countries are especially known for their high level of competitiveness, which mostly stems from the size of population and their upbringing at the very least, if not much else. The stereotypical viewpoint also makes it so that people of such countries are seen as favourable hires around most of the world, even if they would tend to use words like efficient, skilled and thorough instead of course - but the connotation remains nonetheless.


The prisoner's dilemma has been another of those things that has always perplexed me not because of the primary resolutions that someone can make but because of the entire meta-nature of it. It provides for a very good understanding of the human psyche.

Excerpt from Wikipedia:

Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary confinement with no means of speaking to or exchanging messages with the other. The police admit they don't have enough evidence to convict the pair on the principal charge. They plan to sentence both to a year in prison on a lesser charge. Simultaneously, the police offer each prisoner a Faustian bargain. If he testifies against his partner, he will go free while the partner will get three years in prison on the main charge. Oh, yes, there is a catch... If both prisoners testify against each other, both will be sentenced to two years in jail. The prisoners are given a little time to think this over, but in no case may either learn what the other has decided until he has irrevocably made his decision. Each is informed that the other prisoner is being offered the very same deal. Each prisoner is concerned only with his own welfare—with minimizing his own prison sentence.

This leads to four different possible outcomes for prisoners A and B:

  1. If A and B both remain silent, they will each serve one year in prison.
  2. If A testifies against B but B remains silent, A will be set free while B serves three years in prison.
  3. If A remains silent but B testifies against A, A will serve three years in prison and B will be set free.
  4. If A and B testify against each other, they will each serve two years.
undefined
An example payoff matrix / See linked URL for full attribution details. (c) cmglee

Defection always results in a better payoff than cooperation, so it becomes a strictly dominant strategy for both players which makes mutual defection the only strong Nash equilibrium in the game.

One of the most interesting uses of this was in a game show where a person exploits the dilemma by forcing his opponent to split by pretending to reveal his own hand and offering to cut in the opponent after the show is over.

There was a big flaw in the execution of the dilemma itself since the parties were allowed to communicate - and that shows the impact of the open collaboration on a game theory problem like this.

If you can look beyond the risk-to-reward ratio for yourself and instead take a look at the entire picture, a completely rational and cooperative player would be interested in exactly one thing, which is to optimize the reward for both parties, in any way possible - which as we observed in the video was done by misleading the opponent. But it's also important to connote that the opponent themselves must be rational or cooperative to some extent, it is quite possible that someone hot-headed (or rather, unthinking) might simply have chosen to defect out of spite and the strategy fails (for them) despite the good intent of the proposer. But any rational person would realize that the only way to get any reward would be to "split" or cooperate, even if their hand is forced. To summarize:

  • In case the proposer will "steal" or defect, hopefully they will also honour their arrangement and "split" or cooperate later on.
    • If the opponent also defects, it results in the worst-case of no reward, so it is eliminated as an option
  • The proposer will not defect, resulting in the optimal case for both participants

How all of this ties into life and stereotypical views is a bit more interesting. Humans have a natural tendency to tunnel-vision when working towards their goals, this is quite observable in our day-to-day lives, someone very seriously preparing for semester exams or a future promotion will actively work (or as we call it now, grind) to the extent of not paying attention or even caring about the environment around them. As someone who has been quite rude to my parents during preparing for some of those exams, I do regret it and realize how trivial most of my stress really was - but in that frame of reference, it was the only thing that mattered.

The most extreme of these effects is perhaps most strongly observed in astronauts returning from space, called the overview effect. Astronauts are perhaps some of the most trained humans on Earth, they're picked from the best of the bunch from various national armed forces cadres. A big part of training is the ability to follow instructions and knowing the rulebook for each and every scenario that might occur. But even in such attuned humans, upon receiving a very strong visual stimuli in the form of looking at the Earth from hundreds of thousands of miles away (such as The Blue Marble), it has a palpable effect on their psyche in the form of realizing how small most of the problems we face on a day-to-day basis really are, along with a great feeling of concern for our planet and its inhabitants.

What we call experience, wisdom or learnings are perhaps just a softer version of such effects, as response to weaker stimuli. It's our learnings from life itself that makes us who we are after all.


Quite recently, I came across an interesting video from Veritasium on the prisoner's dilemma:

They delve deep into different strategies for multiple rounds of simulated prisoner's dilemma to identify what qualities such strategies should have to accumulate the most rewards. The results are pretty much antithetical to what you would expect, because by design a defector always draws (when their opponent defects) or wins (when their opponent cooperates), and a cooperator always loses (when their opponent defects) or draws (when their opponent cooperates).

Over multiple rounds and rounds of rounds, and even with strategy evolutions (such that successful strategies are allowed to propagate similar to real life), "nice" strategies always win out in the end. In my mind, it raised more questions than answered - but the video did a good job to allay most of my mostly technical concerns.

In a gist, they identified the following qualities to allow for strategies to thrive (in order of importance):

  • Nice (not the first to defect, i.e. altruistic to some extent)
  • Forgiving (ability to cooperate to some extent with known defectors to some extent)
  • Retaliatory (despite being forgiving, non-cooperating with defectors when they cross a defined threshold)
  • Clear (strategies that were less complex with more identifiable behaviour, i.e. less chaotic)

The strategies that didn't work optimally were:

  • Nasty (first to defect, or defecting without reason)
  • Unforgiving (being completely non-cooperative with defectors)
  • Non-retaliatory (cooperating with defectors after repeated defection aka turning the other cheek)

The primary behaviour why nasty strategies lost is because they would see every simulation as a zero-sum game, where someone had to lose for them to win, whereas nice strategies would see it as a positive-sum game, where your win would be also be the opponent's win. I guess you can now see the parallels I am trying to draw here.

Epilogue

One of the biggest problems in the cryptosphere (web3, or whatever you want to call it) are people thinking it's a zero-sum game when it really isn't. There is space for one more L1, or L2, or even another privacy-focused blockchain. It's probably a learned behaviour from strictly capitalist ideologies, and don't get me wrong, capitalism is great, but accumulation of wealth for strictly non-altruistic purposes is a non-starter, in and of itself. I don't mean to be radical here and say that you should only be an effective altruist, I mean to say that as humans, we are already altruists by the definition of our genome. The entire society today has been built on trust and collaboration for a long, long time and will remain to be so after we pass away, long gone. Today we are plagued by scams and wars and incessant Twitter bickering when it really doesn't matter in the long run (ask the astronauts). Go build what you want, travel to the place you've always wanted, talk to that person you've thought about, that's really what it's all about in the end. Most of life is not a zero-sum game, to win, you don't need to beat someone, you can always win together. đź‘«