Microsoft has been steadily pouring money into big data and business intelligence. The company of course owns the most widely used analytical tool in the world, Microsoft Excel, which our benchmark research into Spreadsheets in the Enterprise shows is not going away soon. User resistance (cited by 56% of participants) and lack of a business case (50%) are the most common reasons that spreadsheets are not being replaced in the enterprise.  The challenge is ensuring the spreadsheets are not just personally used but connected and secured into the enterprise to address consistency and a range of  and potential errors. These issues all add up to more work and maintenance as my colleague has pointed out recently.

vr_ss21_spreadsheets_arent_easily_replacedAlong with Microsoft SQL and SharePoint, Excel is at the heart of the company’s BI strategy. In particular, PowerPivot, originally introduced as an add-on for Excel 2010 and built into Excel 2013, is a discovery tool that enables exploratory analytics and data mashups. PowerPivot uses an in-memory, column store approach similar to other tools in the market. Its ability to access multiple data sources including from third parties and government through Microsoft’s Azure Marketplace, enables a robust analytical experience.

Ultimately, information sources are more important than the tool sets used on them. With the Azure Marketplace and access to other new data sources such as Hadoop through partnership with Hortonworks as my colleague assessed, Microsoft is advancing in the big data space. Microsoft has partnered with Hortonworks to bring Hadoop data into the fold through HDInsights, which enable familiar Excel environments to access HDFS via HCatalog. This approach is similar to access methods utilized by other companies, including Teradata which I wrote about last week. Microsoft stresses the 100 percent open source nature of the Hortonworks approach as a standard alternative to the multiple, more proprietary Hadoop distributions occurring throughout the industry. An important benefit for enterprises with Microsoft deployments is that Microsoft Active Directory adds security to HDInsights.

As my colleague Mark Smith recently pointed out about data discovery methods, the analytic discovery category is broad and includes visualization approaches. On the visualization side, Microsoft markets PowerView, also part of Excel 2013, which provides visual analytics and navigation on top of the Microsoft’s BI semantic model. Users also can annotate and highlight content and then embed it directly into PowerPoint presentations. This direct export feature is valuable because PowerPoint is still a critical communication vehicle in many organizations. Another visual tool, currently in preview, is the Excel add-in GeoFlow, which uses Bing Maps to render visually impressive temporal and geographic data in three dimensions. Such a 3-D visualization technique could be useful in many industries.  Our research into next generation business intelligence found that deploying geographic maps (47%) and visualizing metrics on them (41%) are becoming increasing important but Microsoft will need to further exploit location-based analytics and the need for interactivity.

Microsoft has a core advantage in being able to link its front-office tools such as Excel with its back-end systems such as SQL Server 2012 and SharePoint. In particular, having the ability to leverage a common semantic model through Microsoft Analytical Services, in what Microsoft calls its Business Intelligence Semantic Model, users can set up a dynamic exploratory environment through Excel. Once users or analysts have developed a BI work product, they can publish the work product such as a report directly or through SharePoint. This integration enables business users to share data models and solutions and manage them in common, which applies to security controls as well as giving visibility into usage statistics to see when particular applications are gaining traction with organizational users.

Usability, which our benchmark research into next-generation business intelligencevr_ss21_employee_spreadsheet_skills_are_adequate identifies as the number-one evaluation criterion in nearly two-thirds (64%) of organizations, is still a challenge for Microsoft. Excel power users will appreciate the solid capabilities of PowerPivot, but more casual users of Excel – the majority of business people – do not understand how to build pivot tables or formulas. Our research shows that only 11 percent of Excel users are power users and most skill levels are simply adequate (49%) compared to above average or excellent. While PowerView does give some added capability, a number of other vendors of visual discovery products like Tableau have focused on user experience from the ground up, so it is clear that Microsoft needs to address this shortcoming in its design environment.

When we consider more advanced analytic strategies and inclusion of advanced algorithms, Microsoft’s direction is not clear. Its Data Analysis eXpressions (DAX) can help create custom measures and calculated fields, but it is a scripting language akin to MDX. This is useful for IT professionals who are familiar with such tools, but here also business-oriented users will be challenged in using it effectively.

A wild card in Microsoft’s BI and analytics strategy is with mobile technology. Currently, Microsoft is pursuing a build-once, deploy-anywhere model based on HTML5, and is a key member of the Worldwide Web Consortium (W3C) that is defining the standard. The HTML5 standard, which has just passed a big hurdle in terms of candidate recommendation is beginning to show value in the design of new applications that can be access through web-browsers on smartphones and tablets.  The approach of HTML5 could be challenging as our technology innovation research into mobile technology finds more organizations (39%) prefer native mobile applications from the vendors specific application stores compared to 33 percent through web-browser based method and a fifth with no preference. However, the success or failure of its Windows 8-based Surface tablet will be the real barometer of Microsoft mobile BI success since its integration with the Office franchise is a key differentiator. Early adoption of the tablet has not been strong, but Microsoft is said to be doubling down with a new version to be announced shortly. Success would put Office into the hands of the mobile workforce on a widespread basis via Microsoft devices, which could have far-reaching impacts for the mobile BI market.

As it stands now, however, Microsoft faces an uphill battle in establishing its mobile platform in a market dominated by Android and Apple iOS devices like the iPhone and iPad. If the Surface ultimately fails, Microsoft will likely have to open up Office to run on Android and iOS or risk losing its dominant position.  My colleague is quite pessimistic about Microsoft overall mobile technology efforts and its ability to overcome the reality of the existing market. Our technology innovation research into mobile technology finds that over half of organizations have a preference for their smartphone and tablet technology platform, and the first ranked smartphone priorities has Apple (50%), Android (27%) and RIM (17%) as top smartphone platforms with Microsoft a distant fourth (5%); for tablets is Apple (66%), Android (19%) and then Microsoft (8%). Based on these finding, Microsoft faces challenges on both the platform front and if they adapt their technology to support others that are more preferred in business today.

Ultimately, Microsoft is trying to pull together different initiatives across multiple internal business units that are known for being very siloed and not organized well for customers.  Ultimately, Microsoft has relied on its channel partners and customers to figure out how to not just make them work together but also think about what is possible since they are not always given clear guidance from Redmond. Recent efforts find that Microsoft is trying to come together to address the big data and business analytics challenge and the massive opportunity it represents. One area in which this is coming together is Microsoft’s cloud initiatives. Last year’s announcements of Azure virtual machines enables an infrastructure-as-a-service (IaaS) play for Microsoft and positions Windows Azure SQL Database as a service. This could make the back end systems I’ve discussed available through a cloud-based. Ironically, the cloud-based Office365 suite does not include core productivity applications such as Excel and PowerPoint, so front-end access will still come through the client version of the software.

For organizations that already have installed Microsoft as their primary BI platform and are looking for tight integration with an Excel-based discovery environment, the decision to move forward is relatively simple. The trade-off is that this package is still a bit IT-centric and may not attract as many in the larger body of business users as a more user-friendly discovery product might do and address the failings of business intelligence. Furthermore, since Microsoft is not as engaged in direct support and service as other players in this market, it will need to move the traditionally technology focused channel to help their customers become more business savvy. For marketing and other business departments, especially in high-velocity industries where usability and time-to-value is at a premium and back-end integration is secondary, other tools will be worth a look. Microsoft has great potential and with analytics being the top ranked technology innovation priority among its customers I hope that the many divisions inside the global software giant can finally come together to deliver a comprehensive approach.

Regards,

Tony Cosentino

VP and Research Director

Our benchmark research found in business technology innovation that analytics is the most important new technology for improving their organization’s performance; they ranked big data only fifth out of six choices. This and other findings indicate that the best way for big data to contribute value to today’s organizations is to be paired with analytics. Recently, I wrote about what I call the four pillars of big data analytics on which the technology must be built. These areas are the foundation of big data and information optimization, predictive analytics, right-time analytics and the discovery and visualization of analytics. These components gave me a framework for looking at Teradata’s approach to big data analytics during the company’s analyst conference last week in La Jolla, Calif.

The essence of big data is to optimize the information used by the business for whatever type of need as my colleague has identified as a key value of these investmentsVR_2012_TechAward_Winner_LogoData diversity presents a challenge to most enterprise data warehouse architectures. Teradata has been dealing with large, complex sets of data for years, but today’s different data types are forcing new modes of processing in enterprise data warehouses. Teradata is addressing this issue by focusing on a workload-specific architecture that aligns with MapReduce, statistics and SQL. Its Unified Data Architecture (UDA) incorporates the Hortonworks Hadoop distribution, the Aster Data platform and Teradata’s stalwart RDBMS EDW. The Big Data Analytics appliance that encompasses the UDA framework won our annual innovation award in 2012. The system is connected through Infiniband and accesses Hadoop’s metadata layer directly through Hcatalog. Bringing these pieces together represents the type of holistic thinking that is critical for handling big data analytics; at the same time there are some costs as the system includes two MapReduce processing environments. For more on the UDA architecture, read my previous post on Teradata as well as my colleague Mark Smith’s piece.

Predictive analytics is another foundational piece of big data analytics and one of the top priorities in organizations. However, according to our vr_bigdata_big_data_capabilities_not_availablebig data research, it is not available in 41 percent of organizations today. Teradata is addressing it in a number of ways and at the conference Stephen Brobst, Teradata’s CTO, likened big data analytics to a high-school chemistry classroom that has a chemical closet from which you pull out the chemicals needed to perform an experiment in a separate work area. In this analogy, Hadoop and the RDBMS EDW are the chemical closet, and Aster Data provides the sandbox where the experiment is conducted. With mulitple algorithms currently written into the platform and many more promised over the coming months, this sandbox provides a promising big data lab environment. The approach is SQL-centric and as such has its pros and cons. The obvious advantage is that SQL is a declarative language that is easier to learn than procedural languages, and an established skills base exists within most organizations. The disadvantage is that SQL is not the native tongue of many business analysts and statisticians. While it may be easy to call a function within the context of the SQL statement, the same person who can write the statement may not know when and where to call the function. One way for Teradata to expediently address this need is through its existing partnerships with companies like Alteryx, which I wrote about recently. Alteryx provides a user-friendly analytical workflow environment and is establishing a solid presence on the business side of the house. Teradata already works with predictive analytics providers like SAS but should further expand with companies like Revolution Analytics that I assessed that are using R technology to support a new generation of tools.

Teradata is exploiting its advantage with algorithms such as nPath, which shows the path that a customer has taken to a particular outcome such as buying or not buying. According to our big data benchmark research, being able to conduct what-if analysis and predictive analytics are the two most desired capabilities not currently available with big data, as the chart shows. The algorithms that Teradata is building into Aster help address this challenge, but despite customer case studies shown at the conference, Teradata did not clearly demonstrate how this type of algorithm and others seamlessly integrate to address the overall customer experience or other business challenges. While presenters verbalized it in terms of improving churn and fraud models, and we can imagine how the handoffs might occur, the presentations were more technical in nature. As Teradata gains traction with these types of analytical approaches, it will behoove the company to show not just how the algorithm and SQL works but how it works in the use by business and analysts who are not as technically savvy.

Another key principle behind big data analytics is timeliness of the analytics. Given the nature of business intelligence and traditional EDW architectures, until now timeliness of analytics has been associated with how quickly queries run. This has been a strength of the Teradata MPP share-nothing architecture, but other appliance architectures, such as those of Netezza and Greenplum, now challenge Teradata’s hegemony in this area. Furthermore, trends in big data make the situation more complex. In particular, with very large data sets, many analytical environments have replaced the traditional row-level access with column access. Column access is a more natural way for data to be accessed for analytics since it does not have to read through an entire row of data that may not be relevant to the task at hand. At the same time, column-level access has downsides, such as the reduced speed at which you can write to the system; also, as the data set used in the analysis expands to a high number of columns, it can become less efficient than row-level access. Teradata addresses this challenge by providing both row and column access through innovative proprietary access and computation techniques.

Exploratory analytics on large, diverse data sets also has a timeliness imperative. Hadoop promises the ability to conduct iterative analysis on such data sets, which is the reason that companies store big data in the first place according to our big data benchmark research. Iterative analysis is akin to the way the human brain naturally functions, as one question naturally leads to another question. However, methods such as Hive, which allows an SQL-like method to access Hadoop data, can be very slow, sometimes taking hours to return a query. Aster enables much faster access and therefore provides a more dynamic interface for iterative analytics on big data.

Timeliness also has to do with incorporating big data in a stream-oriented environment and only 16 percent of organizations are very satisfied with timeliness of events according to our operational intelligence benchmark research. In a use case such as fraud and security, rule-based systems work with complex algorithmic functions to uncover criminal activity. While Teradata itself does not provide the streaming or complex event processing (CEP) engines, it can provide the big data analytical sandbox and algorithmic firepower necessary to supply the appropriate algorithms for these systems. Teradata partners with major players in this space already, but would be well served to further partner with CEP and other operational intelligence vendors to expand its footprint. By the way, these vendors will be covered in our upcoming Operational Intelligence Value Index, which is based on our operational intelligence benchmark research. This same research showed that analyzing business and IT events together was very important in 45 percent of organizations.

The visualization and discovery of analytics is the last foundational pillarvr_ngbi_br_importance_of_bi_technology_considerations and here Teradata is still a work in progress. While some of the big data visualizations Aster generates show interesting charts, they lack a context to help people interpret the chart. Furthermore, the visualization is not as intuitive and requires the writing and customization of SQL statements. To be fair, most visual and discovery tools today are relationally oriented and Teradata is trying to visualize large and diverse sets of data. Furthermore, Teradata partners with companies including MicroStrategy and Tableau to provide more user-friendly interfaces. As Teradata pursues the big data analytics market, it will be important to demonstrate how it works with its partners to build a more robust and intuitive analytics workflow environment and visualization capability for the line-of-business user. Usability (63%) and functionality (49%) are the top two considerations when evaluating business intelligence systems according to our research on next-generation business intelligence.

Like other large industry technology players, Teradata is adjusting to the changes brought by business technology innovation in just the last few years. Given its highly scalable databases and data modeling – areas that still represent the heart of most company’s information architectures –  Teradata has the potential to pull everything together and leverage their current deployed base. Technologists looking at Teradata’s new and evolving capabilities will need to understand the business use cases and share these with the people in charge of such initiatives. For business users, it is important to realize that big data is more than just visualizing disparate data sets and that greater value lies in setting up an efficient back end process that applies the right architecture and tools to the right business problem.

Regards,

Tony Cosentino
VP and Research Director

Responding to the trend that businesses now ask less sophisticated users to perform analysis and rely on software to help them, Oracle recently announced a new release  of its flagship Oracle BI Foundational Suite (OBIFS 11.1.1.7) as well as updates to Endeca, the discovery platform that Oracle bought in 2011. Endeca is part of a new class of tools that bring new capabilities in information discovery, self-service access and interactivity. Such approaches represent an important part of the evolution of business intelligence to business analytics as I have noted in my agenda for 2013.

Oracle Business Intelligence Foundational Suite includes many components not limited to Oracle Business Intelligence Enterprise Edition (OBIEE), Oracle Essbase and a scorecard and strategy application. OBIEE is the enabling foundation that federates queries across data sources and enables reporting across multiple platforms. Oracle Essbase is an in-memory OLAP tool that enables forecasting and planning, including what-if scenarios embedded in a range of Oracle BI Applications, which are sold separately. The suite, along with the Endeca software, is integrated with Exalytics, Oracle’s appliance for BI and analytics. Oracle’s appliance strategy, which I wrote about after Oracle World last year invests heavily in the Sun Microsystems hardware acquired in 2010.

These updates are far-ranging and numerous (including more than 200 changes to the software). I’d like to point out some important pieces that advance Oracle’s position in the BI market. A visualization recommendations engine offers guidance on the type of visualization that may be appropriate for a user’s particular data. This feature, already sold by others in the market, may be considered a subset of the broader capability of guided analysis. Advanced visualization techniques have become more important for companies as they make it easier for users to understand data and is critical to compete with the likes of  Tableau, a player in this space which I wrote about last year.

Another user-focused update related to visualization is performance tiles, which enable important KPIs to be displayed prominently within the context of the screen surface area. Performance tiles are a great way to start improving the static dashboards that my colleague Mark Smith has critiqued. From what I have seen it is unclear to what degree the business user can define and change Oracle’s performance tile KPIs (for example, the red-flagged metrics assignedvr_bigdata_big_data_capabilities_not_available to the particular business user that appear within the scorecard function of the software) and how much the system can provide in a prescriptive analytic fashion. Other visualizations that have been added include waterfall charts, which enable dependency analysis; these are especially helpful for pricing analysis by showing users how changes in one dimension impact pricing on the whole. Another is MapViews for manipulation and design to support location analytics that our next generation BI research finds the capability to deploy geographic maps are most important to BI in 47 percent of organizations, and then visualize metrics associated with locations in 41 percent of organizations. Stack charts now provide auto-weighting for 100-percent sum analysis that can be helpful for analytics such as attribution models. Breadcrumbs empower users to understand and drill back through their navigation process, which helps them understand how a person came to a particular analytical conclusion. Finally Trellis View actions provides contextual functionality to help turn data into action in an operational environment. The advancements of these visualizations are critical for Oracle big data efforts as visualization is a top three big data capability not available in 37 percent of organizations according to our big data research and our latest technology innovation research on business analytics found presenting data visually as the second most important capability for organizations according to 48 percent of organizations.

vr_ngbi_br_collaboration_tool_access_preferencesThe update to Oracle Smart View for Office also puts more capability in the hands of users. It natively integrates Excel and other Microsoft Office applications with operational BI dashboards so users can perform analysis and prepare ad-hoc reports directly within these desktop environments. This is an important advance for Oracle since our benchmark research in the use of spreadsheets across the enterprise found that the combination of BI and spreadsheets happens all the time or frequently in 74 percent of organization. Additionally the importance of collaborating with business intelligence is essential and having tighter integration is a critical use case as found in our next generation business intelligence research that found using Microsoft Office for collaboration with business intelligence is important to 36 percent of organizations.

Oracle efforts to evolve its social collaboration efforts through what they call Oracle Social Network have advanced significantly but do not appear to be in the short term plan to integrate and make available through its business intelligence offering. Our research finds more than two-thirds (67%) rank this as important and then embedding it within BI is a top need in 38 percent of organizations. Much of what Oracle already provides could be easily integrated and meet business demand for a range of people-based interactions that most are still struggling to manage through e-mail.

Oracle has extended its existing capabilities in its OBIEE with Hadoop integration via a HIVE connector that allows Oracle to pull data into OBIEE from big data sources, while an MDX search function enabled by integration with the Endeca discovery tool allows OBIEE to do full text search and data discovery. Connections to new data sources are critically important in today’s environment; our research shows that retaining and analyzing more data is the number-one ranked use for big data in 29 percent of organizations according to our technology innovation research. Federated data discovery is particularly important as most companies are often unaware of their information assets and therefore unknowingly limit their analysis.

Beyond the core BI product, Oracle made significant advances with Endeca 3.0. Users can now analyze Excel files. This is an existing capability for other vendors, so it was important for Oracle to gain parity here. Beyond that, Endeca now comes with a native JavaScript Object Notation (JSON) reader and support for authorization standards. This furthers its ability to do contextual analysis and sentiment analysis on data in text and social media. Endeca also now can pull data from the Oracle BI server to marry with the analysis. Overall the new version of Endeca enables new business-driven information discovery that is essential to relieve the stress on analysts and IT to create and publish information and insights to business.

Oracle’s continued investments into BI applications that supply prebuilt analytics and these packaged analytics applications span from the front office (sales and marketing), to operations (procurement and supply chain) to the back office (finance and HR). Given the enterprise-wide support, Oracle’s BI can perform cross-functional analytics and deliver fast time to value since users do not have to spend time building the dashboards. Through interoperation with the company’s enterprise applications, customers can execute action directly into applications such as PeopleSoft, JD Edwards or Oracle Business Suite. Oracle has begun to leverage more of its score-carding function that enables KPI relationships to be mapped and information aggregated and trended. Scorecards are important for analytic cultures because they are a common communication platform for executive decision-makers and allow ownership assignment of metrics.

I was surprised to not find much advancement in Oracle business intelligence efforts that operate on smartphones and tablets. Our research finds mobile business intelligence is important to 69 percent of organizations and that 78 percent of organizations reveal that no or some BI capabilities are available in their current deployment of BI. For those that are using mobile business intelligence, only 28 percent are satisfied. For years, IT has not placed a priority on mobile support of BI while business has been clamoring for it and now more readily leading the efforts with 52 percent planning new or expanded deployments on tablets and 32 percent on smartphones. In this highly competitive market to capture more opportunity, Oracle will need to significantly advance its efforts and make its capabilities freely available without passwords as other BI providers have already done. It also will need to recognize that business is more interested in alerts and events through notifications to mobile technology than trying to make the entire suite of BI capabilities replicated on these technologies.

Oracle has foundational positions in enterprise applications and database technology and has used these positions to drive significant vr_ngbi_br_importance_of_bi_technology_considerationssuccess in BI. The company’s proprietary “walled garden” approach worked well for years, but now technology changes, including movements toward open source and cloud computing, threaten that entrenched position. Surprisingly, the company has moved slowly off of its traditional messaging stance targeted at the CIO, IT and the data center. That position seems to focus the company too much on the technology-driven 3 V’s of big data and analytics, and not enough on the business driven 3 W’s that I advocate. As the industry moves into the age of analytics, where information is looked upon as a critical commodity and usability is the key to adoption (our research finds usability to be the top evaluation consideration in 63 percent of organizations), CIOs will need to further move beyond its IT approach for BI as I have noted and get more engaged into the requirements of business. Oracle’s business intelligence strategy and how it addresses these business outcomes and the use across all business users is key to the company’s future and organizations should examine these critical advancements to its BI offering very closely to determine if you can improve the value of information and big data in an organization.

Regards,

Tony Cosentino

VP and Research Director

Just about all the CIOs I speak with are at an inflection point in their careers. Some are just biding time before retirement, but many are emerging CIOs who are driven more by a business imperative than a technological one. Today, market and cultural pressures are forcing CIOs to move quickly and be flexible. In many ways, this is antithetical to the posture of IT, which can often be described as slow and methodical. This posture however is no longer sustainable in the era of the six forces of business technology innovation that Ventana Research tracks in our BTI benchmark research.

Well-read publications such as the Harvard Business Review, the New York Times and Wired Magazine are espousing the virtues of big data and analytics. CEOs are listening and demanding that their organizations be adaptive and flexible – and most have iPads. This fact should not be underestimated, because everything you do on an iPad is easy. To bring social, local and mobile intelligence together, organizations face the challenge of slow descriptive analytics and what my colleague Mark Smith rightly calls pathetic dashboard environments. Our next-generation business intelligence benchmark research shows organizations rate the importance of business intelligence as very high, but satisfaction levels are low and have a declining trend line. At the same time, usability is growing to be the top buying criteria for business intelligence by an astounding margin over functionality (64% vs. 49%). The driver of these numbers is that expectations are being set at the consumer level by apps such as Google and Yelp!, and business intelligence applications are not living up to these expectations.

vr_ngbi_br_importance_of_bi_technology_considerationsThis is a difficult situation for CIOs because IT has heavy investments in business intelligence tools and in the SQL skills that often underlie support for them. At the same time, they need to think about what the business needs as much as what they currently have in their environment.

In the 1990s, CIOs faced a similar situation with ERP systems and the OLAP tools that were being deployed on the client side of organizations’ technology architectures. In that case, CIOs often needed to think from the perspective of the CFO and form a close partnership with the operational finance team. This was a natural partnership because both areas are number-driven and tools-oriented. In today’s environment, however, the CIO must partner with the CMO and iPad-carrying executives to drive competitive advantage through analytics and big data. The rapid revolution of big data technologies and what I have described as the four pillars of big data analytics are being adopted by business in many cases outside the scope of the CIO. Much to the chagrin of IT, those executives and marketers do not want spreadsheets and “pathetic” dashboards. They want visual discovery, they want search, they want prescriptive analytics, and they want results.  

Finally, CIOs must grapple with the fact that the business must be involved in building out IT since he can no longer have tight centralized control of all technology. Organizations have many different applications sprouting up, from visual discovery tools and business analytics that are also becoming part of the growing use of cloud computing. CIOs cannot even get a baseline on the company’s current technology environment because they often have no idea what’s happening outside of the data center. CIOs need to start learning from business users about the new technologies they’re using and collaborate with them on how they can put the business tools together with what is running in the data center. 

CIOs have been metaphorically barricaded in the data center, and the data center been a cost center, not a profit center, and certainly not an investment center.  CIOs must now reach out to business users to show value. In order to fulfill on the value promise, they will help business users make sure they can deliver on the table stakes: trusted data, secure data and governance around the data. This requires an organizational cadence that can result only from a marriage of the business side with the IT side.

Regards,

Tony Cosentino

VP and Research Director

Last week, IBM brought industry analysts to its famed Almaden Research Center, where the company outlined its big data analytics strategy and introduced a number of new innovations. Big data is no new topic to IBM, which has for decades helped organizations store and use data. But technology has changed over those decades, and IBM is working hard to ensure it is part of the future and not just the past. Our latest business technology innovation research into big data technology finds that retaining and analyzing more data is the first-ranked priority in 29 percent of organizations. From both an IT and a business perspective, big data is critical to IBM’s future success.

On the strategy side, there was much discussion at the event around use cases and the different patterns of deployment for big data analytics. Inhi Cho Suh, vice president of strategy, outlined five compelling use cases for big data analytics:

  1. Discovery and visualization. These types of exploratory analytics in a federated environment are a big part of big data analytics, since they can unlock patterns that can be useful in areas as diverse as determining a root cause of an airline issue or understanding relationships among buyers. IBM is working hard to ensure that products such as IBM Cognos Insight can evolve to support a new generation of visual discovery for big data.
  2. 360-degree view of the customer. By bringing together data sources and applying analytics to increase such things as customer loyalty and share-of-wallet, companies can gain more revenue and market share with fewer resources. IBM needs to ensure it can actually support a broad array of information about customers – not just transactional or social media data but also voice as well as mobile interactions that also use text.
  3. Security and intelligence. This area includes areas around fraud and real-time cyber security, where companies leverage big data to predict anomalies and contain risk. IBM has been enhancing its ability to process real-time streams and transactions across any network. This is an important area for the company as it works to drive competitive advantage.
  4. Operational analysis. This is the ability to leverage networks of instrumented data sources to enable proactive monitoring through baseline analysis and real-time feedback mechanisms. The need for better operational analytics continues to increase. Our latest research on operational intelligence finds that organizations that use dedicated tools to handle this need will be more satisfied and gain better outcomes than those that do not.
  5. Data warehouse augmentation.  Big data stores can replace some traditional data stores and archival systems to allow larger sets of data to be analyzed, providing better information and leading to more precise decision-making capabilities. It should be no surprise that IBM has customers with some of the larger data warehouse deployments. The company can help customers evaluate their technology and improve or replace existing investments.

Prior to Inhi taking the stage, Dave Laverty, vice president of marketing, went through the new technologies being introduced. The first announcement was the BLU Accelerator – dynamic in-memory technology that promises to improve both performance and manageability on DB2 10.5. In tests, IBM says it achieved better than 10,000x performance on queries. The secret sauce lies in the ability to do column store data retrieval, maximize CPU processing, and provide skipping of data that is not needed for the particular analysis at hand. The benefits to the user are much faster performance across very large data sets and a reduction in manual SQL optimization. Our latest research into business technology innovation finds that in-memory technology is the technology most planned for use with big data in the next two years (22%), ahead of RDBMS (10%), data warehouse appliance (19%), specialized database (19%) and  Hadoop (20%).

vr_bigdata_obstacles_to_big_data_analytics (2)An intriguing comment from one of IBM’s customers was “What is bad SQL in a world with BLU?” An important extension of that question might be “What is the future role for database administrators, given new advancements around databases, and how do we leverage that skill set to fill the big data analytics gap?” According to our business technology innovations research, staffing (79%) and training (77%) are the two biggest challenges to implementing big data analytics.

One of IBM’s answers to the question of the skills gap comes in the form of BigSQL. A newly announced feature of InfoSphere BigInsights 2.1, BigSQL layers on top of BigInsights to provide accessibility through industry-standard SQL and SQL-based applications. Providing access to Hadoop has been a sticking point for organizations, since they have traditionally needed to write procedural code to access Hadoop data. BigSQL is similar in function to Greenplum’s Pivotal, Teradata Aster and Cloudera’s Impala, where SQL is used to mine data out of Hadoop. All of these products aim to provide access for SQL-trained users and for SQL-based applications, which represent the predominance of BI tools currently deployed in industry. The challenge for IBM, with a product portfolio that includes BigInsights and Cognos Insight, is to offer a clear message about what products meet what types of analytic needs for what types of business and IT professional needs. In addition further clarity from IBM on when to use big data analytics software partners like Datameer who was on an industry panel at the event and part of IBM global educational tour that I have also analyzed.

Another IBM announcement was the PureData System for Hadoop. This appliance approach to Hadoop provides a turnkey solution that can be up and running in a matter of hours. As you would expect in an appliance approach, it allows for consistent administration, workflow, provisioning and security with BigInsights. It also allows access to Hadoop through BigSheets, which presents summary information about the unstructured data in Hadoop, and which was already part of the BigInsights platform. Phil Francisco, vice president of big data product management and strategy, pointed out use cases around archival capabilities and the ability to do cold storage analysis as well as the ability to bring many unstructured sources together. The PureData System for Hadoop, due out in the second half of the year, adds a third version to the BigInsights lineup, which also includes the free web-based version and the Enterprise version. Expanding to support Hadoop with its appliances is critical as more organizations look to exploit the processing power of Hadoop technology for their database and information management needs.

Other announcements included new versions of InfoSphere Streams and Informix TimeSeries for reporting and analytics using smart meter and sensor technology. They help with real-time analytics and big data depending on the business and architectural needs of an organization. The integration of database and streaming analytics are key areas where IBM differentiates itself in the market.

Late in the day, Les Rechan, general manager for business analytics, told the crowd that he and Bob Picciano, general manager for information management, had recently promised the company $20 billion in revenue. That statement is important because in the age of big data, information management and analytics must be considered together, and the company needs a strong relationship between these two leaders to meet this ambitious objective. In an interview, Rechan told me that the teams realize this and are working hand-in-glove across strategy, product development and marketing. The camaraderie between the gentlemen was clear during the event, and bodes well for the organization. Ultimately, IBM will need to articulate why it should be considered for big data, as our technology innovation research finds organizations today are less worried about validation of a vendor from a size perspective (23%) compared to usability of the technology (64%).

IBM’s big data platform seems to be less a specific offer and more of an ethos of how to think about big data and big data analytics in a common-sense way. The focus on five well-thought-out use cases provides customers a frame for thinking through the benefits of big data analytics and gives them a head start with their business cases. Given the confusion in the market around big data, that common-sense approach serves the market well, and it is very much aligned with our own philosophy of focusing on what we call the business-oriented Ws rather than the technology-oriented Vs.

Big data analytics, and in particular predictive analytics, is complex and difficult to integrate into current architectures. Our benchmark research into predictive analytics shows that architectural integration is the biggest inhibitor with 55 percent of companies, which should be a message IBM takes to heart about integration of its predictive analytics tools with its big data technology options. Predictive analytics is the most important capability (49%) for business analytics, according to our technology innovation research, and IBM needs to  show more solutions that integrate predictive analytics with big data.

H.L. Mencken once said, “For every complex problem there is an answer that is clear, simple and wrong.” Big data analytics is a complex problem, and the market is still early. The latent benefit of IBM’s big data analytics strategy is that it allows IBM to continue to innovate and deliver without playing all of its chips at one time. In today’s environment, many supplier companies don’t have the same luxury.

As I pointed out in my blog post on the four pillars of big data analytics,vr_predanalytics_predictive_analytics_obstacles our research and clients are moving toward addressing big data and analytics in a more holistic and integrated manner. The focus shift is less about how organizations store or process information than how they use it. Some may argue that the IBM’s cadence is reflective of company size and is actually a competitive disadvantage, but I would argue that size and innovation leadership are not mutually exclusive. As companies grapple with the onslaught of big data and analytics, no one should underestimate IBM’s outcomes-based and services-driven approach, but in order to succeed IBM also needs to ensure it can meet the needs of organizations at a price they can afford.

Regards,

Tony Cosentino

VP and Research Director

ParAccel is a well-funded big data startup, with $64 million invested in the firm so far. Only a few companies can top this level of startup funding, and most of them are service-based rather than product-based companies. Amazon has a 20 percent stake in the company and is making a big bet on the company’s technology to run its Redshift data warehouse in the cloud initiative. Microstrategy also uses ParAccel for it’s cloud offering, but holds no equity in the company.

ParAccel provides a software-based analytical platform that competes in the database appliance market, and as many in the space are increasingly trying to do, it is building analytic processes on top of the platform. On the base level, ParAccel is a massively parallel processing (MPP) database with columnar compression support, which allows for very fast query and analysis times. It is offered either as software or in an appliance configuration which, as we’ll discuss in a moment, is a different approach than many others in the space are taking. It connects with Teradata, Hadoop, Oracle and Microsoft SQL Server databases as well as financial market data such as semi-structured trading data and NYSE data through what the company calls On Demand Integration (ODI). This allows joint analysis through SQL of relational and non-relational data sources. In-database analytics offer more than 600 functions (though places on the company’s website and datasheets still say just over 500).

The company’s latest release, ParAccel 4.0, introduced product enhancements around performance as well as reliability and scalability. Performance enhancements include advanced query optimization that is said to improve aggregation performance 20X by doing “sort-aware” aggregations which tracks data properties up and down the processing pipeline. ParAccel’s own High Speed Interconnect protocol has been further optimized reducing data distribution overhead and speeding query processing. The new version 4.0 introduces new algorithms that exploit I/O patterns to pre-fetch data and store in memory, which again speeds query processing and reduced I/O overhead. The need for scalability is addressed in enhancements to enable the system to scale to 5,000 concurrent connections supporting up to 38,000 users on a single system. Its Hash Join algorithms allow for complex analytics by allowing the number of joins to fit the complexity of the analytic. Finally, interactive workload management introduces a class of persistent queries that allows short running queries and long running queries to be run side by side without impacting performance. This is particularly important as the integration of on-demand data sources through the company’s ODI approach could otherwise interfere with more interactive user requirements.

The company separates out its semi-annual database release cycle from the more iterative analytics release cycle. The new analytic functions just released just last month include a number of interesting developments for the company. Text analytics for various feeds allows for analytics across a variety of use cases, such as social media, customer comment analysis, insurance and warranty claims. In addition, functions such as sessionization and JSON parsing allow a new dimension of analytics for ParAccel as web data can now be analyzed. The new analytic capabilities allow the company to address a broad class of use cases such as “golden path analysis”, fraud detection, attribution modeling, segmentation and profiling. Interestingly, some of these use case are of the same character as those seen in the Hadoop world.

So where does ParAccel fit in the broader appliance landscape? vr_bigdata_big_data_technologies_plannedAccording to our benchmark research on big data more than 35 percent of businesses plan to use appliance technology, but the market is still fragmented. The appliance landscape can be broken down into categories that include hardware and software that run together, software that can be deployed across commodity hardware, and non-relational parallel processing paradigms such as Hadoop. This landscape gets especially interesting when we look at Amazon’s Redshift and the idea of elastic scalability on a relational data warehouse. The lack of elastic scalability in the data warehouse has been a big limitation for business; it has traditionally taken significant money, time and energy to implement.

With its “Right to Deploy” pricing strategy, ParAccel promises the same elasticity as with its on-premises deployments. The new pricing policy removes the traditional per-node pricing obstacles by offering prices based on “unlimited data” and takes into consideration the types of analytics that a company wants to deploy. This strategy may play well against companies that only sell their appliances bundled with hardware. Such vendors will have a difficult time matching ParAccel’s pricing because of their hardware-driven business model. While the offer is likely to get ParAccel invited into more consideration sets, it remains to be seen whether they win more deals based on it.

Partnerships with Amazon and MicroStrategy to provide cloud infrastructure produce a halo effect for ParAccel, but the cloud approaches compete against ParAccel’s internal sales efforts. One of the key differentiators for ParAccel as the company competes against the cloud version of itself will be the analytics that are stacked on top of the platform. Since neither Redshift nor MicroStrategy cloud offers currently license the upper parts of this value stack, customers and prospects will likely hear quite a bit about the library of 600-plus functions and the ability to address advanced analytics for clients. The extensible approach and the fact that the company has built analytics as a first class object in its database allow the architecture to address speed, scalability and analytic complexity. The one potential drawback, depending on how you look at it, is that the statistical libraries are based on user-defined-functions (UDFs) written in a procedural language. While the library integration is seamless to end users and scales well, if a company needs to customize the algorithms, data scientists must go into the underlying procedural programming language to make the changes. The upside is that the broad library of analytics can be used based on the SQL paradigm.

vr_bigdata_obstacles_to_big_data_analytics (2)While ParAccel aligns closely with the Hadoop ecosystem in order to source data, the company also seems to be welcoming opportunities to compete with Hadoop. Some of the use cases mentioned above such as so called “golden-path analysis, and others have been provided as key Hadoop analytic use cases. Furthermore, many Hadoop vendors are bringing the SQL access paradigm and traditional BI tools together with Hadoop to mitigate the skills gap in organizations. But if an MPP database like ParAccel that is built natively for relational data is also able to do big data analytics, and is able to deliver a more mature product with similar horizontal scalability and cost structure, the argument for standard SQL analytics on Hadoop becomes less compelling. If ParAccel is right, and SQL is the Lingua Franca for analytics, then they may be in a good position to fill the so called skills gap. Our benchmark research on business technology innovations shows that the biggest challenge for organizations deploying big data today revolves around staffing and training, with more than 77 percent of companies claiming that they are challenged in both categories.

ParAccel offers a unique approach in a crowded market. The new pricing policy is a brilliant stroke, as it not only will get the company invited into more bid opportunities, but it moves client conversations away from the technology-oriented three Vs and more to analytics and the business-oriented three Ws. If the company puts pricing pressure on the integrated appliance vendors, it will be interesting to see if any of those vendors begin to separate out their own software and allow it to run on commodity hardware. That would be a hard decision for them, since their underlying business models often rely on an integrated hardware/software strategy. With companies such as MicroStrategy and Amazon choosing it for their underlying analytical platforms, the company is one to watch. Depending on the use case and the organization, ParAccel’s in-database analytics should be readily considered and contrasted with other approaches.

Regards,

Tony Cosentino

VP and Research Director

This year’s Inspire, Alteryx’s annual user conference, featured new developments around the company’s analytics platform. Alteryx CEO Dean Stoecker kicked off the event by talking about the promise of big data, the dissemination of analytics throughout the organization, and the data artisan as the “new boss.” Alteryx coined the term “data artisan” to represent the persona at the center of the company’s development and marketing efforts. My colleague Mark Smith wrote about the rise of the data artisan in his analysis of last year’s event.

President and COO George Mathews keynoted day two, getting into more specifics on the upcoming 8.5 product release. vr_ngbi_br_importance_of_bi_technology_considerationsAdvancements revolve around improvement in the analytical design environment, embedded search capabilities, the addition of interactive mapping and direct model output into Tableau. The goal is to provide an easier, more intuitive user experience. Our benchmark research into next-generation business intelligence shows buyers consider usability the top buying criteria at 63 percent. The redesigned Alteryx interface boasts a new look for the icons and more standardization across different functional environments. Color coding of the toolbox groups tools according to functions, such as data preparation, analytics and reporting. A new favorites function is another good addition, given that users tend to rely on the same tools depending on their role within the analytics value chain. Users can now look at workflows horizontally and not just vertically, and easily change the orientation if for example they are working on an Apple iPad. Version 8.5 allows embedded search and more streamlined navigation, and continues its focus on a role-based application, which my colleague has been advocating for a while. According to the company, 94 percent of its user base demanded interactive mapping; that’s now part of the product, letting users draw a polygon around an area of interest, then integrate it into the analytical application for runtime execution.

The highlight of the talk was the announcement of integration with Tableau 8.0 and the ability to write directly to the software without having to follow the cumbersome process of exporting a file and then reopening it in another application. Alteryx was an alpha partner and worked directly with the code base for Tableau 8.0, which I wrote up a few months ago. The partnership exemplifies the coopetition environment that many companies find themselves in today. While Tableau does some basic prediction, and Alteryx does some basic visual reporting, the companies’ core competencies brought together into one workflow is much more powerful for the user. Another interesting aspect is the juxtaposition of the two user groups. The visually oriented Tableau group in San Diego seemed much younger and was certainly much louder on the reveals, while the analytically oriented Alteryx group was much more subdued.

Alteryx has been around since 1997, when it was called SRC. It grew up focused around location analytics, which allowed it to establish foundational analytic use cases in vertical areas such as real estate and retail. After changing the company name and focusing more on horizontal analytics, Alteryx is growing fast with backing from, interestingly enough, SAP Ventures. Since the company was already profitable, it used a modest infusion of capital to grow its product marketing and sales functions. The move seems to have paid off. Companies such as Dunkin Brands and Redbox use Alteryx and the company has made significant inroads with marketing services companies.  A number of consulting companies, such as Absolute Data and Capgemini, are using Alteryx for customer and marketing analytics and other use cases. I had an interesting talk with the CEO of a small but important services firm who said that he is being asked to introduce innovative analytical approaches to much larger marketing services and market research firms. He told me that Alteryx is a key part of the solution he’ll be introducing to enable things such as big data analytics.

Alteryx provides value in a few innovative ways that are not new to this release, but that are foundational to the company’s business vr_bigdata_obstacles_to_big_data_analyticsstrategy. First, it marries data integration with analytics, which allows business users who have traditionally worked in a flat-file environment to pull from multiple data sources and integrate information within the context of the Alteryx application. Within that same environment, users can build analytic workflows and publish applications to a private or public cloud. This approach helps address the obstacles found in our research in big data analytics where staffing (79%) and training (77%) are addressed by Alteryx through providing more flexibility for business to engage into the analytic process.

Alteryx manages an analytics application store called the Analytics Gallery that crowdsources and shares user-created models. These analytical assets can be used internally within an organization or sold on the broader market. Proprietary algorithms can be secured through a black box approach, or made open to allow other users to tweak the analytic code. It’s similar to what companies like Datameer are doing on top of Hadoop, or Informatica in the cloud integration market. The store gives descriptions of what the applications do, such as fuzzy matching or target marketing. Being crowdsourced, the number of applications should proliferate over time, tracking advancements in the R open source project, since R is at the heart of the Alteryx analytic strategy and what it calls clear box analytics. The underlying algorithm is easily viewed and edited based on permissions established by the data artisan, similar to what we’ve seen with companies such as 1010data. Alteryx 8.5 works with R 3.0, the latest version. On the back end, Alteryx partners with enterprise data warehouse powerhouses such as Teradata, and works with the Hortonworks Hadoop distribution.

I encourage analysts of all stripes to take a look at the Alteryx portfolio. Perhaps start with the Analytics Gallery to get a flavor of what the company does and the type of analytics customers are building and using today.  Alteryx can benefit analysts looking to move beyond the limitations of a flat-file analytics environment, and especially marketing analysts who want to marry third-party data from sources such as the US Census Bureau, Experian, TomTom or Salesforce, which Alteryx offers within its product. If you have not seen Alteryx, you should take a look and see how they are changing the way analytic processes are designed and managed.

Regards,

Tony Cosentino

VP and Research Director

SAS Institute held its 24th annual analyst summit last week in Steamboat Springs, Colorado. The 37-year-old privately held company is a key player in big data analytics, and company executives showed off their latest developments and product roadmaps. In particular, LASR Analytical Server and Visual Analytics 6.2, which is due to be released this summer, are critical to SAS’ ability to secure and expand its role as a preeminent analytics vendor in the big data era.

For SAS, the competitive advantage in Big Data rests in predictive vr_predanalytics_predictive_analytics_obstaclesanalytics, and according to our benchmark research into predictive analytics, 55 percent of businesses say the challenge of architectural integration is a top obstacle to rolling out predictive analytics in the organization. Integration of analytics is particularly daunting in a big-data-driven world, since analytics processing has traditionally taken place on a platform separate from where the data is stored, but now they must come together. How data is moved into parallelized systems and how analytics are consumed by business users are key questions in the market today that SAS is looking to address with its LASR and Visual Analytics.

Jim Goodnight, the company’s founder and plainspoken CEO, says he saw the industry changing a few years ago. He speaks of a large bank doing a heavy analytical risk computation that took upwards of 18 hours, which meant that the results of the computation were not ready in time for the next trading day. To gain competitive advantage, the time window needed to be reduced, but running the analytics in a serialized fashion was a limiting factor. This led SAS to begin parallelizing the company’s workhorse procedures, some of which were first developed upwards of 30 years ago. Goodnight also discussed the fact that building these parallelizing statistical models is no easy task. One of the biggest hurdles is getting the mathematicians and data scientists that are building these elaborate models to think in terms of the new parallelized architectural paradigm.

Its Visual Analytics software is a key component of the SAS Big Data Analytics strategy. Our latest business technology innovation benchmark research [http://www.ventanaresearch.com/bti/] found that close to half (48%) of organizations present business analytics visually. Visual Analytics, which was introduced early last year, is a cloud-based offering running off of the LASR in-memory analytic engine and the Amazon Web Services infrastructure. This web-based approach allows SAS to iterate quickly without worrying a great deal about revision management while giving IT a simpler server management scenario. Furthermore, the web-based approach provides analysts with a sandbox environment for working with and visualizing in the cloud big data analytics; the analytic assets can then be moved into a production environment. This approach will also eventually allow SAS to combine data integration capabilities with the data analysis capabilities.

With descriptive statistics being the ante in today’s visual discovery world, SAS is focusing Visual Analytics to take advantage of the vr_bigdata_obstacles_to_big_data_analytics (2)company’s predictive analytics history and capabilities. Visual Analytics 6.2 integrates predictive analytics and rapid predictive modeling (RPM) to do, among other things, segmentation, propensity modeling and forecasting. RPM makes it possible for models to be generated via sophisticated software that runs through multiple algorithms to find the best fit based on the data involved. This type of commodity modeling approach will likely gain significant traction as companies look to bring analytics into industrial processes and address the skills gap in advanced analytics. According to our BTI research, the skills gap is the biggest challenge facing big data analytics today, as participants identified staffing (79%) and training (77%) as the top two challenges.

Visual Analytics’ web-based approach is likely a good long-term bet for SAS, as it marries data integration and cloud strategies. These factors, coupled with the company’s installed base and army of loyal users, give SAS a head start in redefining the world of analytics. Its focus on integrating visual analytics for data discovery, integration and commodity modeling approaches also provides compelling time-to-value for big data analytics. In specific areas such as marketing analytics, the ability to bring analytics into the applications themselves and allow data-savvy marketers to conduct a segmentation and propensity analysis in the context of a specific campaign can be a real advantage. Many of SAS’ innovations cannibalize its own markets, but such is the dilemma of any major analytics company today.

The biggest threat to SAS today is the open source movement, which offers big data analytic approaches such as Mahout and R. For instance, the latest release of R includes facilities for building parallelized code. While academics working in R often still build their models in a non-parallelized, non-industrial fashion, the current and future releases of R promise more industrialization. As integration of Hadoop into today’s architectures becomes more common, staffing and skillsets are often a larger obstacle than the software budget. In this environment the large services companies loom larger because of their role in defining the direction of big data analytics. Currently, SAS partners with companies such as Accenture and Deloitte, but in many instances these companies have split loyalties. For this reason, the lack of a large in-house services and education arm may work against SAS.

At the same time, SAS possesses blueprints for major analytic processes across different industries as well as horizontal analytic deployments, and it is working to move these to a parallelized environment. This may prove to be a differentiator in the battle versus R, since it is unclear how quickly the open source R community, which is still primarily academic, will undertake the parallelization of R’s algorithms.

SAS partners closely with database appliance vendors such as Greenplum and Teradata, with which it has had longstanding development relationships. With Teradata, it integrates into the BYNET messaging system, allowing for optimized performance between Teradata’s relational database and the LASR Analytic Server. Hadoop is also supported in the SAS reference architecture. LASR accesses HDFS directly and can run as a thin memory layer on top of the Hadoop deployment. In this type of deployment, Hadoop takes care of everything outside the analytic processing, including memory management, job control and workload management.

These latest developments will be of keen interest to SAS customers. Non-SAS customers who are exploring advanced analytics in a big data environment should consider SAS LASR and its MPP approach. Visual Analytics follows the “freemium” model that is prevalent in the market, and since it is web-based, any instances downloaded today can be automatically upgraded when the new version arrives in the summer. For the price, the tool is certainly worth a test drive for analysts. For anyone looking into such tools and foresee the need for inclusion predictive analytics, it should be of particular interest.

Regards,

Tony Cosentino
VP and Research Director

Big data analytics is being offered as the key to addressing a wide array of management and operational needs across business and IT. But the label “big data analytics” is used in a variety of ways, confusing people about its usefulness and value and about how best to implement to drive business value. The uncertainty this causes poses a challenge for organizations that want to take advantage of big data in order to gain competitive advantage, comply with regulations, manage risk and improve profitability.

Recently, I discussed a high-level framework for thinking about big data analytics that aligns with former Census Director Robert Groves’ ideas of designed data on the one hand and organic data on the other. This second article completes that picture by looking at four specific areas that constitute the practical aspects of big data analytics – topics that must be brought into any holistic discussion of big data analytics strategy. Today, these often represent point-oriented approaches, but architectures are now coming to market that promise more unified solutions.

Big Data and Information Optimization: the intersection of big data analytics and traditional approaches to analytics. Analytics performed by database professionals often differ significantly from analytics delivered by line-of-business staffers who work in more flat-file-oriented environments. Today, advancements in in-memory systems, vr_bigdata_obstacles_to_big_data_analyticsin-database analytics and workload-specific appliances provide scalable architectures that bring processing to the data source and allow organizations to push analytics out to a broader audience, but how to bridge the divide between the two kinds of analytics is still a key question. Given the relative immaturity of new technologies and the dominance of relational databases for information delivery, it is critical to examine how all analytical assets will interact with core database systems.  As we move to operationalizing analytics on an industrial scale, the current advanced analytical approaches break down because it requires pulling data into a separate analytic environment and does not leverage advances in parallel computing. Furthermore, organizations need to determine how they can apply existing skill sets and analytical access paradigms such as business intelligence tools, SQL, spreadsheets and visual analysis, to big data analytics. Our recent big data benchmark research shows that the skills gap is the biggest issue facing analytics initiatives with staffing and training as an obstacle in over three quarters of organizations.

Visual analytics and data discovery: Visualizing data is a hot topic, especially in big data analytics. Much of big data analysis is about finding patterns in data and visualizing them so that people can tell a story and give context to large and diverse sets of data. Exploratory analytics allows us to develop and investigate hypotheses, reduce data, do root-cause analysis and suggest modeling approaches for our predictive analytics. Until now the focus of these tools has been on descriptive statistics related to SQL or flat file environments, but now visual analytics vendors are bringing predictive capabilities into the market to drive usability, especially at the business user level. This is a difficult challenge because the inherent simplicity of these descriptive visual tools clashes with the inherent complexity that defines predictive analytics. In addition, companies are looking to apply visualization to the output of predictive models as well. Visual discovery players are opening up their APIs in order to directly export predictive model output.

New tools and techniques in visualization along with the proliferation of in-memory systems allow companies the means of sorting through and making sense of big data, but exactly how these tools work, the types of visualizations that are important to big data analytics and how they integrate into our current big data analytics architecture are still key questions, as is the issue of how search-based data discovery approaches fit into the architectural landscape.

Predictive analytics: Visual exploration of data cannot surface all patterns, especially the most complex ones. To make sense of enormous data sets, data mining and statistical techniques can find patterns, relationships and anomalies in the data and use them to predict future outcomes for individual cases. Companies need to investigate the use of advanced analytic approaches and algorithmic methods that can transform and analyze organic data for uses such as predicting security threats, uncovering fraud or targeting product offers to particular customers.

Commodity models (a.k.a. good-enough models) are allowing business users to drive the modeling process. How these models can be vr_predanalytics_benifits_of_predictive_analyticsbuilt and consumed at the front line of the organization with only basic oversight by a statistician data scientist is a key area of focus as organizations endeavor to bring analytics into the fabric of the organization. The increased load on the back end systems is another key consideration if the modeling is a dynamic software driven approach. How these models are managed and tracked is yet another consideration. Our research on predictive analytics shows that companies that update their models more frequently have much higher satisfaction ratings than those that update on a less frequent basis.  The research further shows that in over half of organizations that competitive advantage and revenue growth are the primary reasons that predictive analytics are deployed.

Right-time and real-time analytics: It’s important to investigate the intersection of big data analytics with right-time and real-time systems and learn how participants are using big data analytics in production on an industrial scale. This usage guides the decisions that we make today around how to begin the task of big data analytics. Another choice organizations must make is whether to capture and store all of their data and analyze it on the back end, attempt to process it on the fly, or do both. In this context, event processing and decision management technologies represent a big part of big data analytics since they can help examine data streams for value and deliver information to the front lines of the organization immediately. How traditionally batch-oriented big data technologies such as Hadoop fit into the broader picture of right-time consumption still needs to be answered as well. Ultimately, as happens with many aspects of big data analytics, the discussion will need to center on the use case and how to address the time to value (TTV) equation.

Organizations embarking on a big data strategy must not fail to consider the four areas above. Furthermore, their discussions cannot cover just the technological approaches, but must include people, processes and the entire information landscape. Often, this endeavor requires a fundamental rethinking of organizational processes and questioning of the status quo.  Only then can companies see the forest for the trees.

Regards,

Tony Cosentino
VP and Research Director

Platfora has gained a lot of buzz in the Big Data analytics market primarily through word of mouth. Late last year the company took the covers off of some impressive and potentially disruptive technology that takes aim at the broad BI and business analytics ecosystem, including the very foundation on which the industry is built. It recently demonstrated its software at the Strata Conference where the audience that is fixated on big data was in attendance.

Platfora looks to provide the underlying architecture of tomorrow’s vr_ngbi_br_importance_of_bi_technology_considerationsBI systems and address the challenge of big data analytics. Our benchmark research shows that one of the biggest hurdles facing next-generation BI systems is usability and was the top category in 63 percent of organizations. This is of specific concern when it comes to big data analytics and today’s Hadoop ecosystem, where many companies are taking to the Field of Dreams approach – if you build it, they will come. That is, many companies are setting up Hadoop clusters, but users have no access to the underlying data and need data scientists to come in and painstakingly extract nuggets of value. Simply connecting Hadoop to applications via connectors does not work well since there is no good way to sort through the Hadoop data to decide what to move into a more production-oriented system.

Platfora promises to solve this problem by bypassing both traditional architectures and newer hybrid architectures and putting everything in Hadoop, from data capture to data preparation to analysis and visualization.

The challenge with traditional architectures, Platfora argues, is that they organize data in a predetermined manner, but today’s big data analytics environment dictates that organizations cannot determine in advance what they will need to explore in the future. If a user gets to a level of analysis that is not part of the current schema, someone in the organization must undertake a herculean effort to recreate the entire data model. It’s the ‘I don’t know what I don’t know’ challenge. In my blog post Big Data Analytics Faces a Chasm of Understanding, I discuss the difference in exploratory analytics and confirmatory approaches that marks the difference between the 20th and 21st century approaches to business analytics. Businesses need both, but the nature of big data demands the exploratory approach be given more weight.

Platfora stores data in Hadoop and works with all of the open source stacks, including those from Cloudera, HortonWorks and MapR, as well as EMC Pivotal HD proprietary distribution announced just this week and assessed by my colleague. The secret sauce for Platfora is its ability to provide visibility into the underlying file system and provide a shopping-basket metaphor, where an analyst can choose vr_bigdata_obstacles_to_big_data_analyticsdifferent dimensions that are of interest. Through what the company calls Fractal Cache technology, which is a distributed query engine, it takes the data and creates the relationship on the fly in-memory. This essentially provides an ad hoc data mart, which an analyst can then access to do slice-and-dice analysis with sub-second response times and solve exploratory analytics challenges. If an analyst drills down and finds that an interesting piece of information is not included in the model, he can have the software recreate the model on an ad hoc basis, which generally takes from minutes up to an hour, according to the company.

The software’s power and ease of use allows business analysts to expand the breadth of questions they can ask of the data without having to go back to IT. According to the company, it takes only a few hours of training on the system to get up and running. This is especially important given that our Big Data benchmark research that assessed the challenges of Hadoop says one of the biggest challenges organizations face today is one of staffing and training as found in over three quarters of organizations. If Platfora can solve this conundrum and implement it within the enterprise, it will indeed start to move organizations beyond the technologically oriented three V’s discussion about big data into the business-oriented discussion around the three Ws.

The biggest challenge the company may face is institutional. Companies have spent billions of dollars implementing their current architectures, and relationships with software providers often run deep. Furthermore, the idea of such a system largely replacing traditional data warehouses threatens not only the competition, but perhaps the departments the company aims to sell into. Simply put, such a business-usable system obviates the need for an entire area of IT. Many firms, especially large ones, are inherently risk-averse, and this may be the biggest challenge facing Platfora. Other software providers have started with similar messaging out of the gate, but then shifted to more of a coexistence-messaging approach to gain traction in organizations. It will be interesting to see whether Platfora yields to these same headwinds as it moves through its beta phase and into GA.

In sum, Platfora is an exciting new company, and companies that are adopting Hadoop should look into the way in can drive big data analytics and maybe change the culture of their organizations.

Regards,

Tony Cosentino

VP and Research Director

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 38 other followers

Tony Cosentino – Twitter

Blog Stats

  • 17,192 hits
Follow

Get every new post delivered to your Inbox.

Join 38 other followers

%d bloggers like this: