A Quote by Sendhil Mullainathan

The problem with data is that it says a lot, but it also says nothing. 'Big data' is terrific, but it's usually thin. To understand why something is happening, we have to engage in both forensics and guess work.
People think 'big data' avoids the problem of discrimination because you are dealing with big data sets, but, in fact, big data is being used for more and more precise forms of discrimination - a form of data redlining.
I'm going to say something rather controversial. Big data, as people understand it today, is just a bigger version of small data. Fundamentally, what we're doing with data has not changed; there's just more of it.
Let's look at lending, where they're using big data for the credit side. And it's just credit data enhanced, by the way, which we do, too. It's nothing mystical. But they're very good at reducing the pain points. They can underwrite it quicker using - I'm just going to call it big data, for lack of a better term: "Why does it take two weeks? Why can't you do it in 15 minutes?"
Any time scientists disagree, it's because we have insufficient data. Then we can agree on what kind of data to get; we get the data; and the data solves the problem. Either I'm right, or you're right, or we're both wrong. And we move on. That kind of conflict resolution does not exist in politics or religion.
When an economist says the evidence is "mixed," he or she means that theory says one thing and data says the opposite.
Apple knows a lot of data. Facebook knows a lot of data. Amazon knows a lot of data. Microsoft used to, and still does with some people, but in the newer world, Microsoft knows less and less about me. Xbox still knows a lot about people who play games. But those are the big five, I guess.
Disruptive technology is a theory. It says this will happen and this is why; it's a statement of cause and effect. In our teaching we have so exalted the virtues of data-driven decision making that in many ways we condemn managers only to be able to take action after the data is clear and the game is over. In many ways a good theory is more accurate than data. It allows you to see into the future more clearly.
One [Big Data] challenge is how we can understand and use big data when it comes in an unstructured format.
Where big data is all about seeking correlations - and thus to make incremental changes - small data is all about causations - seeking to understand the reasons why.
A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data.
Scientific data are not taken for museum purposes; they are taken as a basis for doing something. If nothing is to be done with the data, then there is no use in collecting any. The ultimate purpose of taking data is to provide a basis for action or a recommendation for action. The step intermediate between the collection of data and the action is prediction.
Big data has been used by human beings for a long time - just in bricks-and-mortar applications. Insurance and standardized tests are both examples of big data from before the Internet.
Facebook collects a lot of data from people and admits it. And it also collects data which isn't admitted. And Google does too. As for Microsoft, I don't know. But I do know that Windows has features that send data about the user.
Biases and blind spots exist in big data as much as they do in individual perceptions and experiences. Yet there is a problematic belief that bigger data is always better data and that correlation is as good as causation.
People believe the best way to learn from the data is to have a hypothesis and then go check it, but the data is so complex that someone who is working with a data set will not know the most significant things to ask. That's a huge problem.
Big data is great when you want to verify and quantify small data - as big data is all about seeking a correlation - small data about seeking the causation.
This site uses cookies to ensure you get the best experience. More info...
Got it!