How to Manage and Analyze Big Data

May 21, 2013 at 5:32 am | Uncategorized | No comment

Humans produce about 2.5 qunitillion bytes of new information every day. In fact, we've produced more data in the past two years than throughout history. This brings new challenges to businesses, governments and organizations charged with collecting this data, storing it, analyzing it, managing it and acting on it. Here's an overview of what big data is, what has to be done with it and how this information is used.

Big Data Defined

The definition of big data changes, because it is defined as the amount of data that becomes difficult to manage, and newer, better technologies for storing and analyzing data are always being developed. Some companies might have trouble managing a few dozen terabytes of information, and others might not have trouble until databases reach several petabytes. Any modern database would consider an exabyte or more of information to be big data.

How much information is this? One character equals one byte of information. A kilobyte equals 1,024 characters, or bytes. A megabyte equals 1,048,576 bytes. A gigabyte equals 1,073,741,824 characters. A terabyte equals 1,099,511,627,776 bytes. A petabyte equals 1,125,899,906,842,624 bytes, and an exabyte equals 1,152,921,504,606,846,976 bytes of information.

Scientists estimate that humans are capable of storing up to 295 exabytes of information, assuming we filled every floppy disk, hard drive, X-ray chip, microchip and other data storage device on earth. If you were to put this much data on CDs and stack them on top of each other, the stack would reach the moon.

Where Does Big Data Come From?

Where is all this data coming from, and why are we getting so much more data than ever before? Big data comes from many sources, including (but not limited to):

  • Meteorological information (satellites, radar, etc.)
  • Social media and social networking sites
  • Digital photography and videography
  • Transaction records (purchases, deposits and withdrawals from accounts, etc.)
  • GPS signals from cell phones
  • Website server logs
  • Clickstream data from the Internet (who clicks on what)
  • Information from cell phone calls
  • Information gathered by surveillance cameras, sensors, etc.

These are all forms of unstructured data. In other words, unless this data is analyzed it is formless, meaningless and we can't do anything with it. Unstructured data is different from structured data, such as names, addresses and phone numbers, which is pretty straightforward to read and analyze.

How is Big Data Used?

As you can see, the types of data we accumulate varies tremendously, so the uses for this data is also quite diverse. Some of the data can be used to identify and prevent crimes like fraud, terrorism and identity theft.

For example, if Tommy Jones usually does all his shopping in Omaha, Nebraska and he normally buys groceries, video games and carry-out pizza, his bank (and Homeland Security) might have some questions if he suddenly appears in Flagstaff, Arizona stocking up on rifles, ammunition and fertilizer.

Other ways we can use this data include (but aren't limited to):

  • Targeting consumers with ads, offers and coupons they're most likely to use
  • Identifying weather patterns for more accurate predictions
  • Determining what people are interested in according to the websites they visit
  • Using surveillance videos and sensor information to identify security risks
  • Stopping fraud, identity theft and terrorist activities before anyone is hurt or money is taken

As you can see, some of this data has to be analyzed pretty quickly in order to be effective. It won't do us much good when we find out Tommy's identity was stolen by a terrorist and used to buy weapons and bomb-making supplies after the terrorist has already struck his targets.

We need to know this within hours, if not minutes.

Challenges of Managing Big Data

It puzzles us when we learn someone already had information to prevent a problem and didn't act on it. What we often don't understand is the sheer volume of information coming in and what it takes to analyze that data and make it useful in time to be effective. Here are some of the challenges big data centers face in analyzing data for useful information:

  • It takes highly skilled and well-trained people to analyze big data and find useful patterns and correlations
  • These skilled workers need advanced technologies to analyze the data
  • The cost of technologies to gather, store and analyze data is tremendous, and prohibitive for many businesses
  • It's challenging to integrate new analytics technologies into older databases

The smaller the company, the more difficult it is to implement the proper technology and analysts to make sense of data within budget. Before going silent and unrecognized, older companies have challenges because their databases often aren't compatible with new data analytics technologies. In this economy, every size organization thinks twice before investing millions of dollars into gathering, storing and analyzing huge amounts of data.

Challenges of Analyzing Big Data

There are several new technologies helping us make sense of big data quickly and efficiently, including NoSQL databases, Hadoop and MapReduce. Industry experts identify the four dimensions of data analytics:

  • Volume - taking raw big data and making it serve a purpose
  • Velocity - reacting quickly to the information gathered from data
  • Variety - using all the different types of structured and unstructured data we have effectively
  • Veracity - assuring the data we have is trustworthy

Perhaps the hardest of these four is veracity. One third of executives make decisions based on information they actually don't trust. It's relatively easy to develop technologies to find correlations and patterns in data, but more difficult to decide exactly what (if anything) these correlations and patterns actually mean.

In short, big data is all of the information we gather from all of the sources we have. The data is stored in databases, which vary in size. Most big data is housed in enormous data centers all over the world. As we develop new technologies to analyze this data, we're able to make it useful. We can use this information to create more effective advertising campaigns, get more accurate weather and climate information, protect us from thieves and terrorists and much more.

Aside from the physical and technological challenges managing big data presents, the issue of security is on the forefront of everyone's minds. How can we keep all this information out of the wrong hands while distributing it to people who can use it for good? What will the world be like when our entire lives are available to businesses, banks, the government and insurance companies in a single click? Only the future can tell us for sure.


You must be logged in to post a comment.