Biases and blind spots exist in big data as much as they do in individual perceptions and experiences. Yet there is a problematic belief that bigger data is always better data and that correlation is as good as causation.
Big data is great when you want to verify and quantify small data - as big data is all about seeking a correlation - small data about seeking the causation.
People think 'big data' avoids the problem of discrimination because you are dealing with big data sets, but, in fact, big data is being used for more and more precise forms of discrimination - a form of data redlining.
We should be cautious about embracing data before it is published in the academic press, and must always avoid treating correlation as causation.
Big data will never give you big ideas... Big data doesn't facilitate big leaps of the imagination. It will never conjure up a PC revolution or any kind of paradigm shift. And while it might tell you what to aim for, it can't tell you how to get there
We Americans are trained to think big, talk big, act big, love big, admire bigness but then the essential mystery is in the small.
If ... we choose a group of social phenomena with no antecedent knowledge of the causation or absence of causation among them, then the calculation of correlation coefficients, total or partial, will not advance us a step toward evaluating the importance of the causes at work.
One [Big Data] challenge is how we can understand and use big data when it comes in an unstructured format.
Big data has been used by human beings for a long time - just in bricks-and-mortar applications. Insurance and standardized tests are both examples of big data from before the Internet.
Correlation is not causation.
MapReduce has become the assembly language for big data processing, and SnapReduce employs sophisticated techniques to compile SnapLogic data integration pipelines into this new big data target language. Applying everything we know about the two worlds of integration and Hadoop, we built our technology to directly fit MapReduce, making the process of connectivity and large scale data integration seamless and simple.
We get more data about people than any other data company gets about people, about anything - and it's not even close. We're looking at what you know, what you don't know, how you learn best. The big difference between us and other big data companies is that we're not ever marketing your data to a third party for any reason.
You are what you think. So just think big, believe big, act big, work big, give big, forgive big, laugh big, love big and live big.
I'm very familiar with how people can confuse correlation with causation.
Let's look at lending, where they're using big data for the credit side. And it's just credit data enhanced, by the way, which we do, too. It's nothing mystical. But they're very good at reducing the pain points. They can underwrite it quicker using - I'm just going to call it big data, for lack of a better term: "Why does it take two weeks? Why can't you do it in 15 minutes?"
Don't confuse correlation and causation. Almost all great records eventually dwindle.