Saturday, March 16, 2013

WHEN MORE TRUMPS BETTER


The flood of digital information now being collected on all of us means corporations can predict what we'll buy, police can forecast where crimes occur and more.
By EVGENY MOROZOV

Back in 2010, when he was still chief executive of Google, Eric Schmidt made an intriguing confession. "One day we had a conversation where we figured we could just try to predict the stock market," he told a conference in Abu Dhabi. "And then we decided it was illegal. So we stopped doing that." If you think Google is already ubiquitous, think again: Armed with the data it collects from Web searches, as well as from its online advertising business and even mobile devices, the company could enter many more industries. (CLICK BELOW TO READ MORE)


While Google still balks at predicting the stock market, it already uses its search data to predict flu trends. Its online translation tool, though unlikely to satisfy most professional linguists, outperforms its competitors thanks to one key feature: It has been fed much more data. Having scanned millions of books, Google has also spawned an academic discipline with the clunky name of "culturomics": Tools like Google's Ngram Viewer let researchers understand the usage of words like "freedom" or "democracy" through the centuries.

As Viktor Mayer-Schönberger and Kenneth Cukier argue in "Big Data," Google and other companies are beginning to discover that they can do far more than previously imagined with the data they've been collecting all these years. "The world of big data," the authors proclaim, "is poised to shake up everything from businesses and the sciences to healthcare, government, education, economics, the humanities, and every other aspect of society." It's a bold assertion but one they support with convincing evidence.

The authors make clear that "big data" is much more than a Silicon Valley buzzword, and their book brims with examples of data-assisted decision making in diverse industries, from travel and banking to journalism and gaming. Start-ups like ZestFinance (co-founded by a former chief information officer at Google) allow lenders to consider thousands of factors—for example, whether a loan applicant uses a prepaid cellphone—to extend short-term loans to people with a poor or nonexistent credit history. Geo-location companies like Sense Networks and Skyhook collect data supplied by mobile phones and other devices. It can be used to identify which areas of a city have the most bustling nightlife or, more disturbingly, to estimate how many protesters will turn up at demonstrations.

"Big Data" even features a few charming historical vignettes. The tale of Commodore Maury—a 19th-century big-data pioneer who, while working for the U.S. Navy, used ship logbooks to identify more efficient sea routes—is particularly engrossing. Maury designed a clever scheme that encouraged captains to regularly throw into the sea bottles that indicated their coordinates—to be picked up by other passing ships. This basic information-exchange scheme yielded superior seafaring charts that shortened long voyages by about a third.

Compare Maury's primitive methods to today's and you understand one reason data analysis has made a leap in the past few years: There is so much more of it. "Cars today are stuffed with chips, sensors and software," the authors note. So are phones. Sensors have gotten smaller and cheaper and can report more data from more sources; they can detect almost anything, from our heart rate to signs of wear and tear on bridges. Some sensors can even be attached to previously "dumb" objects, suddenly making them "smart" and allowing them to generate feedback.

Since data is cheap and ubiquitous, the authors say, we no longer need to worry about limited samples and incomplete data sets. Why not study the data in its entirety, without leaving anything out? Such comprehensiveness allows us to relax previous requirements for exactitude and accept data in all of its real-world messiness, which, curiously, can increase the quality of the final product. Thus, note the authors, "it isn't just that 'more trumps some,' but that . . . sometimes 'more trumps better.' "

One of the more revealing points made by Messrs. Mayer-Schönberger and Cukier is that, with so much data at our fingertips, we can abandon the pursuit of causal explanations and focus on correlations alone. "In a big-data world . . . we won't have to be fixated on causality; instead we can discover patterns and correlations in the data that offer us novel and invaluable insights," they write. "Big data is about what, not why."

There are many contexts where the "why" is indeed a luxury and the "what" is good enough. When Amazon deploys its sales data to spot books that are often bought together, the recommendation doesn't need to know why hundreds of customers who bought "War and Peace," for instance, also bought "The Idiot." Nor did Google need to know why sites linked to each other when it decided to use linking behavior to build a powerful search engine.

But just how far can this logic take us? Writing mostly for a business audience, the authors don't fully grapple with the implications of such an approach when it comes to, say, public administration and governance. Will ambitious structural reforms in public policy even be possible if we don't arrive at some basic causal explanation of why some parts of the system are broken in the first place? Take obesity. It's one thing for policy makers to attack the problem knowing that people who walk tend to be more fit. It's quite another to investigate why so few people walk. A policy maker satisfied with correlations might tackle obesity by giving everyone a pedometer or a smartphone with an app to help them track physical activity—never mind that there is nowhere to walk, except for the mall and the highway. A policy maker concerned with causality might invest in pavements and public spaces that would make walking possible. Substituting the "why" with the "what" doesn't just give us the same solutions faster—often it gives us different, potentially inferior solutions.

Fortunately, "Big Data" isn't just another cyber-utopian tome, and the final section of the book offers a critical look at some of the darker effects of recording and analyzing everything. ("Delete," Mr. Mayer-Schönberger's previous book, celebrated the virtues of forgetting in a world of databases and social networks.) The authors also discuss the challenges of maintaining a just legal system when crimes could be predicted before they happened, and relying too much on the seemingly neat solutions promised by big-data evangelism. "If Henry Ford had queried big-data algorithms for what his customers wanted, they would have replied 'a faster horse,' " the authors quip.

Alas, one must read to the end for these stimulating reflections; earlier chapters might have benefited from a more critical stance. The book's explicit business slant also occasionally mars the narrative, especially when the authors plunge into the jargon of Dilbert cartoons and Soviet propaganda posters. (Did you know that "digitization turbocharges datafication"?) But these shortcomings don't undermine the achievement of "Big Data": No other book offers such an accessible and balanced tour of the many benefits and downsides of our continuing infatuation with data.

—Mr. Morozov is the author of "To Save Everything, Click Here: The Folly
of Technological Solutionism."

Enhanced by Zemanta

No comments:

Post a Comment