At a Bio-IT World Conference sponsor-held dinner late last month, Cleversafe founder Chris Gladwin presented his predictions for big data.
"Ten years from now… the organization will have the zettabyte," he said. "Sub-petabyte enterprise drives won't exist."
Gladwin should know. As one of Chicago's technology sector heavyweights, his name appears on more than 1,000 patents related to advanced data storage technology.
Obviously, that's a lot of data to manage. For years, pundits in the health and life sciences (an industry infamous for big-data gluttony) have fretted over their ability to search and access their increasingly abundant data.
Also: How do they afford it all? The solution, most agree, involves cloud services.
"The fundamental issue we run into over and over again is… 'information bottleneck,'" Jason Stowe, CEO of Cycle Computing, told Bio-IT attendees in a pre-conference workshop. Describing the "limitation of fixed-size infrastructure," Stowe asserted that entirely local solutions don't mix well with big data, because of the tendency to size questions to the available computing environment. The non-IaaS big data enterprise is either constrained by insufficient IT or overpaying for unused HPC.
R. Mark Adams, CIO of Good Start Genetics (a Cycle Computing customer), concurs. "The interesting thing about cloud computing is it sort of forces you to pay by cloud compute unit… unlike the stuff in our basement, which we're basically paying for anytime the electricity's on and the air conditioning's running, which is pretty much all the time," he told workshop attendees.
Many enterprises already have invested heavily in on-premises HPC infrastructures. Adams addresses this conundrum of how to balance local HPC infrastructure with the cloud -- whether an enterprise should "shift… to the cloud in toto," piecemeal it, or even decide that "I'm just gonna do new things out in the cloud."
The ultimate question is what it is "going to cost us to buy new computers to accommodate our scale," he said. "The only way it's usually cheaper to buy the equivalent hardware all in is if you're gonna run it" at a 90% load.
"Tape runs about four cents," said Benjamin Breton, a bioinformatics software engineer at Good Start Genetics. "Something like Amazon Glacier is cheaper." Though Breton was careful to point out that Amazon does not do business associate agreements for Glacier yet, he said scaling out with IaaS is still better than saying, "You know that computer we just spent $2 million on? I need a million dollars for another one, because it's not enough."
Hybrid IaaS solutions are becoming increasingly popular and innovative. Several Bio-IT World exhibitors -- including Seven Bridges Genomics and Bina Technologies -- presented local turnkey "infrastructure-in-a-box" appliances that, though able to work independently of the cloud, can be leveraged to complement both local storage and the providers' own cloud platforms and infrastructures.
Bina's next-generation sequencing appliances, for instance, optimize data output for automatic upload to its newly unveiled Annotation Platform, which can filter large data sets in real-time. Gianfranco de Feo, Bina's vice president of marketing, said hybrid solutions like these "can marry performance" between hardware and software.
The timing, too, is especially good right now for big-data enterprises seeking the power of IaaS.
Adams said Good Start Genetics was able to come in well under the conservative end of its budget estimate for cloud services. "These guys are in a pricing war right now." Between growing competition and the significant drop in cloud demand because of Edward Snowden's revelations about NSA spying, this comes as little surprise.
Breton, who analyzed long-term storage costs should Good Start Genetics keep its accumulated data for 20 years, is optimistic that yet more low prices are coming. He advised attendees that a one-year contract with cloud providers would be better than a three-year contract, because of these bargains.
"This kind of utility access can really shorten the [data integration] process," Stowe said. It "can fundamentally change the way in which we approach scientific processes."
In other words: "bigger data faster" -- the industry creed.