People think 'big data' avoids the problem of discrimination because you are dealing with big data sets, but, in fact, big data is being used for more and more precise forms of discrimination - a form of data redlining.
Biases and blind spots exist in big data as much as they do in individual perceptions and experiences. Yet there is a problematic belief that bigger data is always better data and that correlation is as good as causation.
Big data is great when you want to verify and quantify small data - as big data is all about seeking a correlation - small data about seeking the causation.
MapReduce has become the assembly language for big data processing, and SnapReduce employs sophisticated techniques to compile SnapLogic data integration pipelines into this new big data target language. Applying everything we know about the two worlds of integration and Hadoop, we built our technology to directly fit MapReduce, making the process of connectivity and large scale data integration seamless and simple.
With too little data, you won't be able to make any conclusions that you trust. With loads of data you will find relationships that aren't real... Big data isn't about bits, it's about talent.
I'm going to say something rather controversial. Big data, as people understand it today, is just a bigger version of small data. Fundamentally, what we're doing with data has not changed; there's just more of it.
We get more data about people than any other data company gets about people, about anything - and it's not even close. We're looking at what you know, what you don't know, how you learn best. The big difference between us and other big data companies is that we're not ever marketing your data to a third party for any reason.
One [Big Data] challenge is how we can understand and use big data when it comes in an unstructured format.
The biggest mistake is an over-reliance on data. Managers will say if there are no data they can take no action. However, data only exist about the past. By the time data become conclusive, it is too late to take actions based on those conclusions.
'Data exhaust' is probably my least favorite phrase in the big data world 'cause it sounds like something you're trying to get rid of or something noxious that comes out of the back of your car.
One of the myths about the Internet of Things is that companies have all the data they need, but their real challenge is making sense of it. In reality, the cost of collecting some kinds of data remains too high, the quality of the data isn't always good enough, and it remains difficult to integrate multiple data sources.
A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data.
We all say data is the next white oil. [Owning the oil field is not as important as owning the refinery because what will make the big money is in refining the oil. Same goes with data, and making sure you extract the real value out of the data.]
Big data has been used by human beings for a long time - just in bricks-and-mortar applications. Insurance and standardized tests are both examples of big data from before the Internet.
In the increasingly digital world, data is a valuable currency, yet as consumers, we control and own little of it. As consumers, we must ask what big companies do with our data, a question directed to both the online and traditional ones.
Most of 'big data' is a fraud because it is really 'dumb data.'