Sunday, September 9, 2018

Data and Goliath - book review


TL;DR - you need to read this, and I was impressed enough to buy a physical copy of the book.

Listening to Audiobooks is a great way to make use of my time while driving, and it sure makes traffic that much more bearable. As a bonus, I get to learn new stuff. Ain't that great?
This time I was listening to Data and Goliath by Bruce Schneier. I must admit - I wasn't expecting much of it: Big data here, big data there, big data everywhere. In fact, this book makes a very good case for the importance of privacy and does a good job describing the problems in the trends we see in today's economy where data is one of the most valuable currencies. I was eagerly waiting to get to the final section of the book where the author lists some actions we can take to improve the current state of affairs. This last section, sadly, is the one point I didn't like , since after making some promises at the beginning this section can be summed up as "there isn't much you can do, so go ask your government to pass some laws".
After a short summary of the book in the introduction, which is great if you don't have the patience to walk along the details,  the book delivers its content in three section.
The first section, dubbed "The world we're creating" is very much a description of the current state of affaires. It goes over what is data and how it is generated (short answer - we generate data merely by existing, with our phone providing location information, security cameras videotaping us, and everything we do in the internet is, basically, generate data), how surveillance is ubiquitous (a word used many times in this book) and focused on "meta-data", how cheap it is to just pile up the data - we got to the point where storing stuff is cheaper than filtering the interesting bits out. It describes two factors driving forward the momentum of surveillance - the financial incentives of businesses that use the data for marketing that supports a "free" service on the internet, and then sell information as another revenue path. The second is the governments, that quite unsurprisingly wants to know everything - fighting crime and terrorism are the most used reasons to support that. While private companies may collect the data of its users (which might be a staggering amount of data if one considers giants such as Facebook, Google or Microsoft) and maybe buy some data from other companies, but the government is even more encompassing - laws require companies to share that data without much supervision (and sometimes a gag order is issued to ensure everything remains hidden), other regulations might demand a backdoor for use of the government and sometimes the various agencies actively hack products and infrastructure to maintain access to data. One main concept that makes perfect sense and yet I've not considered it explicitly before is that if enough data is collected on the people one interacts with, just about everything can be inferred about a person who, in theory, isn't tracked - just because tracking the others will capture their interactions with that person. The book appropriately uses the term "inverse herd immunity".

The second part is going back to "why?" or, to be more specific, why should we care so much? The most important part, in my eyes, is a head-on challenging of the way too common saying "If you're not doing anything wrong, you don't have anything to hide". This statement is with us way too long, and I've yet to hear a debate around privacy that did not use it in one form or the other. In fact, there are many secrets people keep all the time - Think about people who had AIDS a few years back: everyone who knew that about them would assume that they were homosexual and irresponsible (the former is still not fully accepted everywhere) even if they got infected by a blood transfusion. But that's an extreme case - would you like your employer to know if you've started looking for another place? And what if you were trying to surprise your spouse with a vacation abroad and someone other than you told them that? Would you want a conversation where your manager asked you to help a struggling co-worker become common office knowledge?
We all have secrets. Or at least some data compartmentalization mechanisms in place - we choose which information to share and with whom, and if asked directly, most of us would not volunteer to be monitored (A rather blunt example of the effect of complete lack of privacy can be seen in the the movie "The Circle", which is not a very good movie. You can rather listen to Screen Testing episode 11 which is where I've heard about it).
Besides the personal aspect of privacy, the book mentions other reasons for strongly opposing mass surveillance (yes, we don't usually think of it in those terms, but when just about every company or national or municipal authority has access to data about a large amount of people - this is what it is).
Those reasons are political liberty, commercial fairness, economy and security.
Of those reasons, the one with most aspects is, surprisingly, the political liberty. For starters, let's consider what the author calls a "chilling effect" - when people know they are being watched, they behave differently. Just remember the last time you drove by a police car and slowed down a bit even though you were well below the speed limit. Now imagine driving where you know that by the end of the day, the police would get an exact report of when and where you were speeding, and when and where you were crossing a white line. This can be easily done if the police were to take your location data from your phone or service provider. Such surveillance pushes people to conform with norms
Second is the potential of being harassed by the law - No man is 100% law abiding - people speed, make mistakes on their taxes, cross streets in red light and so on. A strong enough political figure (or a petty enough police person) could make an individual's life a miserable place if they were to dig in on the data about them and look for petty crimes. Mass surveillance helps lower the costs of such activities, and removes most of the regulation around it.
Finally on that matter is the important role dissidents play in social change. It's a bit odd to wrap ones head around it at first, but then it just clicks. Basically, in order to have social change we need to allow for some degree of illegal activities. How so? Consider two rather recent examples: Same sex marriage and smoking marijuana. Looking 30 years back, both of those activities were considered shameful at least, and probably downright illegal (It still is in some countries). Yet, a growing number of people were doing it - first in hiding, then the laws were not enforced and then started the legalisation debate (which is still going on in some places). In the meanwhile, public opinion is shifting. This is possible since homosexuals could hide in the closet and not being persecuted, and since enough people used pot illegally without "getting caught" (or if they did, without serious repercussions) so they could form communities and lobby for that specific activity to become legal and accepted. When surveillance is omnipresent, we get the opposite. The chilling effect mentioned earlier kicks in and people are trying to remain well within the "norm", thus dragging social changes to a halt. When people are aware of being constantly monitored  they prefer to err on the side of safety and not act (in "Think Fast and Slow" Daniel Kahneman states that people value loss or pain about twice as more intensely as gain or pleasure) and thus self-censor their actions and behaviours. This inaction, in turn, is causing stagnation and fortifies the boundaries at what is "acceptable", effectively narrowing them.
The other categories are almost self explanatory -
The economy of a given country is suffering from survelliance since there are other comparable products that will track people less. For a long while the company I worked at blocked Skype from being installed, since Microsoft were (being forced into?) providing a backdoor for the NSA to eavesdrop on Skype calls. After Cambridge Analitica's shenanigans with Facbook blew up, we could see the #DeleteFacebook hashtag running around, and other examples are out there.  The chapter focuses mainly on regulation forcing companies to "share" data with the authorities and asks an important question: if a certain country is known to demand businesses to provide backdoors, and issues all encompassing gag-orders to hide it - who would do business with any company from that country?
Commercial fairness is the term the book uses to describe data driven discrimination. After listening to Weapons of Math Destruction I needed very little convincing that "big data" can be and is being used in ways that discriminate people unjustly. In short, data is being used as a more obfuscated form of redlining (For anyone such as myself lacking the American reference - redlining was a practice where banks avoided investing or approving loans in impoverished neighborhoods). While there is an objective financial gain out of redlining - obfuscated or not - this practice is harmful in the long run, hampering social mobility and punishing people for being poor or part of a minority group.
Lastly is the argument of security. Again, this focuses mainly on activities of governments and other nation authorities. The claim is simple, and widely accepted in the security community: There can be no backdoor that can ensure only "good guys" will access it. By forcing companies to install security flaws to their products, by actively hacking civilian organizations, by hoarding vulnerabilities instead of driving for them to be fixed the governments are making the entire internet less secure.
One last point that is worth mentioning in this section: None of the arguments raised here are new, so one might wonder what has recently changed to warrant such an interest in privacy? The answer is that two things have changed - The first is that storing information has become so cheap that "let's just store that and see later if we can do something with it" is a viable, almost affordable, strategy - so more information about is is being stored. The second thing is that our life revolves more and more around computerized systems. Our phone,and the multitude of security cameras on the street mean that we generate a whole lot of data just by taking a stroll. In the past, information was ephemeral, in that that once a conversation was over, it would reside only in the memory of the participants (recording was possible, but not common). If someone sketched a note and then ripped it to shreds, that information was gone. This is not the case today when we communicate online and our information exists on other people's computers (sometime we call them "servers") that have routine backup, so even data we thought we've deleted might only have been marked in the database as such and not actually deleted, and even if it was deleted, this database might have backup tapes that go years back. Today, the follies we make as teenagers go on social media and will haunt us when we're older - Children will see their parents drunk in photos from 20 years ago, potential employers will see an old tweet they vehemently disagree with, a picture shared by a proud parent today will be used in 20 years to steal that child's identity. The persistance of data ensures that if there's a tiny bit of information we don't want a specific person to know, they are sure to find it.

So, what can we do about it? The book answers this question for three types of actors - governments, corporations and private people. 
The governments wanting to improve the privacy of the world they exist in are quite powerful: They can set rules and regulations that limit the access to data, strive to fix vulnerabilities their intelligence agencies find instead of hoarding them, protect whistleblowers from charges (so that the citizens will know if the government officials are subverting their privacy) and avoid a number of harmful activities exerted by the different departments. The most interesting idea in this part is to provide a "commons" Internet arena - platforms in the internet that are "publicly owned", such as parks or sidewalks - a place where financial pressure to track the users and maximize revenue is negated. Those public domains should be defined with specific laws ensuring proper conduct and resilience to surveillance,  and should be budgeted by the people's taxes.

Corporations, too, are quite powerful in that they are the main collectors of information. So, if a specific corporation decides to collect less data - they can. They can also be very transparent about the data they collect and do a decent job in protecting it so that it will only be used for the purpose it was intended for. Being both technically savvy and large enough, corporations can (and do) also battle the government's attempt to breach security - by creating a secure product when not specifically obligated by law to do otherwise, by challenging warrants and government requirements in the court of law and by investing in research to secure their product. Since the current state of affairs favours businesses that do surveillance it is understandable that quite a large part of the chapter is about "what a government should do to protect from corporations".

The third section is the private people using the internet - being the target of surveillance by two major forces, there is, unsurprisingly very little one can actually do without incurring significant harm to their ability to operate - A person can choose to install some privacy enhancing plug-ins (Privacy badger and AdBlock are two that I use), we can make sure we use TLS everywhere we can, avoid having a social media profile, pay for services that are promising privacy instead of using the "free" equivalents that guzzles your data to increase revenue. One can also leave without their phone regularly, pay only in cash and move to the mountains living off the land. Apart from those rather insignificant actions, the main suggestion is to ask your local politicians to change the laws.

One thing which is important to remember while reading this book is that the idea of trading our privacy for services isn't inherently wrong, and the author does not claim otherwise - processing data, even personal data, can do some good - it can improve people's health and support research (imagine a DNA database that is used to find suitable organ donors, or warn people about dormant life threatening genetic flaws) it can be used to improve road safety and reduce traffic or prevent credit card fraud. Also, minor as it might be, it can help people find things they want by getting better personalized advertisement. The main issue is that the deal today is implicit and all encompassing - there's no backing off from the deal we unknowingly made, and there aren't enough incentives to not keeping all of our data.

The last point I want to touch upon is how fast this book seems to age. It was published in 2015 with what was at the time the most up-to-date information, including some insights from the Snowden leaks in 2013. Despite that, while listening to the book I was having a constant feeling of "missing out". In roughly three years since the book publication, many of the trends shown as warning signs seem to be already in full scale, and reversing the trend seems even more difficult than what is described in the book. I'm a bit optimistic to see some positive changes such as GDPR, and wonder if it will be enough, or will we drift towards a world with zero privacy.


In conclusion - go and read this book.

Also, it just so happens that I finish writing this just before the Jewish new-year, so if you got until here: Happy New Year!
שנה טובה