The Challenges for Immediacy
The overwhelming access to on-demand media, anytime and anywhere via Internet broadband, handheld devices, streaming services, smart television and evolving IoT (Internet of things) devices has forever changed the time-to-market and monetization model for content providers, cloud OTT (over-the-top) service providers, cable companies, broadcasters, streaming media providers, carriers and the like. All companies that are part of the content creation-aggregation-to-delivery chain are faced with the growing challenge of seamlessly and efficiently delivering high-quality, delivery-ready, file-based content in multiple file formats across a myriad of media platforms and outlets to targeted global destinations in the fastest way possible—all the while ensuring its quality and security and continually reducing the delivery-to-device time gap. Moreover, with tightening budgets, many content providers are bringing work back in house while pressuring post-production service providers for faster workflow times (viewers expect “immediate access”) placing additional burden on the service providers that are already facing industry consolidation.
Currently, every M&E organization is looking to streamline processes while reducing CapEx and OpEx costs. Post houses are being pressured by their customers to store client content at no charge or for a nominal fee, and to provide immediate access and delivery of digital assets. As a result, one of the major ongoing cost components that businesses are struggling with is storing, managing and making digital assets available throughout their lifecycle—despite the exponential rise of assets and file sizes. Aside from continually increasing storage capacity and its management, businesses must also address the increases in data center footprint, infrastructure complexity, air conditioning and power usage and numerous other implications.
So, can you manage the expectation of free storage when your budget is under attack and margins are already compressed? Can you provide secure search and immediate access for your customers? Can you turn your storage dilemma into a competitive advantage? The answer is a resounding “YES” to all three questions. To understand why, this month’s company highlight focuses on Caringo, a pioneer in the object storage space since 2005, and one of the top object storage platform and cloud incumbents in the Media & Entertainment (M&E) landscape.
Who is Caringo?
Caringo is a leading provider of object storage software that gives users control over any volume, flow or size of unstructured data; significantly reducing cost of ownership, while extracting maximum value and performance from hardware. Caringo’s flagship product, Swarm, eliminates the need to migrate data into disparate solutions for long-term data protection and preservation, management, organization and search at massive scale. These benefits are delivered through Swarm’s symmetric architecture that enables massive scalability, elastic content protection, and automation of management. The result is an object storage software platform that’s ideal for current and future workflows in the M&E industry.
Before continuing, let’s roll back the hands of time to better appreciate the importance of companies like Caringo and the benefits of object storage architectures for the M&E industry.
Caringo’s Evolution and the “Aha Moment”
Caringo was conceived by Jonathan Ring, Paul Carpentier and Mark Goros, who had worked together in the past building companies. Paul Carpentier, the founder of content addressable storage, sold FilePool to EMC, which built a proprietary hardware appliance family called EMC Centera (content-addressable storage (CAS) platform for data archiving, introduced around 2003).
At the time, commodity-based servers became broad based and the cost of raw per gigabyte disk space was coming down fast. Furthermore, older legacy file server vendors were anchored with systems that could not easily scale to petabytes and beyond without creating separate storage silos. It was then that the Caringo founding team recognized there would come a time when commodity-based Software-Defined Storage (SDS) would, in the near future, dominate the market. Moreover, they foresaw the need for an open, RESTful access to Software-Defined Storage on commoditized server technology. As a result, Caringo was formed in 2005 and the first real Software-Defined Storage or private storage cloud was created.
“We knew that the Software-Defined Storage/Cloud Storage market would take time to build and the market to evolve, so we focused on efficient use of capital to build the best technology in the market,” said Jonathan Ring, Caringo CEO & Co-Founder. “We’ve focused on growing the company through customer collaboration to build a field-hardened, customer-centric product that has resulted in a very loyal and satisfied global customer base. What’s more, our solutions are priced based on capacity consumed by the original data (no extra cost for replications to ensure availability and protection) as either a perpetual license with annual support or via annual subscription.”
Today, Caringo’s validation of their vision and philosophy is their growing customer base which includes iQ Media, Sony ImageWorks, Texas Tech University Systems, NEP, the Department of Defense, the Brazilian Federal Court System, City of Austin, Telefónica, British Telecom, Ask.com, Johns Hopkins University and hundreds more worldwide.
It should be noted that, unlike Caringo, many in the object storage space took several shortcuts in the development of their technology. The biggest of which is basing their object storage offering on top of the Linux file system (inode-based file system results in added complexity, unnecessary overhead and the risk of data loss). Caringo’s efficient implementation leaves 95% of disk capacity to the user versus only 80% for competing solutions.
Problems that Caringo’s Object Storage Solves
Object storage technology solves some of the major bottlenecks in the M&E industry by intelligently and economically storing and managing the exponential rise in the lifecycle of data (think long-tail assets, hundreds-to-tens of thousands of working data files per movie, original/duplicate/versioned archive assets, etc.) for long-term data protection. Caringo Swarm solves the problems associated with delivering, organizing and providing accessible scale-out, long-term storage for large media libraries. Swarm is ideal for applications small and large, whether you are a content producer looking for a way to enable OTT and origin storage, a web-property looking to reduce latency for your web-based media applications, or a production house looking to optimize video workflow shared storage. M&E organizations including studios, production houses, broadcasters and service providers can now store what they need, ensure media integrity and keep assets online and accessible for reuse and monetization.
Expanding on the above, we can look at four areas where Swarm can benefit this industry:
- Active Archive, Multi-Site Collaboration & Consolidation: In the fast-paced, real-time realm of broadcast and post, assets need to remain available for reuse and continued collaboration. Files written via NFS, HTTP or S3 are all accessible, editable and immediately retrievable via any other protocol. Synchronous replication over LAN and asynchronous replication over WAN provide management-free disaster recovery. For colder assets like long-tail episodics, users can use Caringo’s patented Darkive to spin down drives and reduce CPU utilization—approaching the low TCO of tape.
- Origin Storage & OTT Enablement: Natively accessible via HTTP, Swarm is ideal for content delivery origin storage and enabling over-the-top (OTT) services. Swarm delivers massive throughput with an unprecedented 95% capacity utilization. Parallel uploading of large files and range reads ensure you can store assets quickly and viewers can view any frame in a video quickly.
- Optimize Production-Shared Storage: At the heart of most editing workflows are Shared Network Attached Storage (NAS) devices that, while necessary from a performance perspective, are difficult and expensive to scale, manage and protect. Caringo solutions optimize existing workflows and applications in shared storage environments by integrating with Windows Storage Server, NetApp Filers, Avid ISIS/Nexis or 3rd-party shared storage, tiering data to Caringo Swarm Object Storage. The result is a system that offers massive scalability and secure accessibility with no need to rip and replace or modify applications, processes or user behavior.
- Backend Digital Asset Management: Through native integrations and S3 API support, Caringo Swarm can be used as the storage target for a myriad of Digital Asset Management and Media Asset Management solutions, bringing simple boot-from-bare-metal installation and continuous protection to an asset management infrastructure. Swarm allows integrated feeds (content distribution) to replicate to a remote site for disaster recovery (DR) or built-in compliance features like WORM, Legal Hold, and Integrity Seals to meet even the most stringent protection policies. This is ideal for users as their digital assets are continuously protected and instantly accessible.
Swarm’s Underlying Technology & Key Differentiators
Caringo’s early recognition of the market opportunity has allowed them to develop a technology roadmap that reflected respective trends in technology, affording their system the opportunity to evolve from the ground up. Unlike some of its competition, there is no inode-based file system under the covers, which means no added complexity, no unnecessary overhead and an almost non-existent risk of data loss.
Caringo Swarm can be considered a storage operating system or a massively scalable web server for storage. It is implemented as a symmetric stateless architecture where all nodes perform the same function and as such, it is highly available out of the box. As a result, according to Ring, Caringo has never had customer data loss. The scalable architecture increases throughput and decreases disk recovery time as it grows, enabling scale up from small to very large clusters. Swarm has a strong metadata component and stores true objects where the data and the metadata reside together to ensure that data is self-describing and portable. No separate databases are required or used for managing objects. Swarm then integrates all this with a search engine to allow flexible metadata search.
As a result, Swarm’s key differentiators versus its competition can be summed up to the following.
- Hardware and server utilization for content providing up to 95% of hard drive space and 100% of drive bays available for storing digital assets
- Ability to automatically add performance or capacity in 90 seconds and continuously upgrade hardware scaling to hundreds of petabytes and billions of files without downtime or disruption to asset accessibility
- Automated policy-based protection that can be tuned for rapid access or smallest data center footprint delivering enterprise-grade durability for collaboration while defending against ransomware attacks and silent data corruption
- Cross-platform collaboration and universal access (Write/Read/Edit) via HTTP, S3, or NFS interchangeably
- Rapid asset retrieval and instant delivery via integrated search with the ability to add custom metadata
Swarm’s User Friendly Integration
Caringo’s Swarm software installs on the user’s choice of standard hardware creating a media management platform that can granularly grow from TBs to PBs, approaching the cost of tape while providing authorized access, plugging into the customer’s identity management solutions or through token-based authentication. Swarm has a RESTful interface based on HTTP 1.1 for rapid direct integration and supports the Amazon S3 API. Most asset management solutions that support the S3 API work with Swarm out of the box. In addition, Swarm has an NFS interface as well as software that automatically tiers from Windows File Servers and NetApp filers to Swarm.
This fall, Caringo announced the integration of Swarm Hybrid Cloud for Microsoft Azure, allowing users to replicate files from S3, NFS, and HDFS to Azure in the Azure native blob format. By seamlessly converting files from an S3 object, Caringo object, Azure blob or NFS accessible file, users can send files to Azure for analysis, protection and long-term archive.
Most recently, Caringo added Caringo Drive, a virtual drive for Swarm, to their product line. Once Caringo Drive is installed on macOS® and Windows® systems, customers have convenient access and can easily drag and drop files to and from Swarm with background parallel transfer. This speeds content uploads and provides simple, drive-based access to Swarm from applications.
Robust and Comprehensive Feature Set
“Our goal from inception has been to change the economics of storage,” continued Ring. “This isn’t just about cost, but about changing the entire process of how media is accessed, organized, delivered and preserved.” According to Ring, Swarm is hardware agnostic and tunable (any combination of x86 hardware and disk capacity), giving users the choice of hardware that allows for high-performance clusters or dark archives. As such, Swarm can store any size file, even TBs, and yet handles small files extremely well. Users can update hardware continuously without affecting the integrity of the media and eliminate storage silos while providing search to enable rapid reuse of media. And, being able to store custom metadata with the media itself eliminates the headaches of metadata databases, making media self-describing and accessible from any application now or in the future. The result is a media platform that streamlines digital media workflows while reducing storage total cost of ownership 75% when compared to file-system-based solutions. Here’s some key feature take-aways that makes Swarm stand out from its competition.
- Complete integrated scale-out cloud and object storage system
- Private, public or hybrid configurations
- Single-site or multi-site deployment
- Multi-tenancy and metering: Multi-tenant cloud that can be used for public or private with authentication tied to standard systems (AD, PIM, etc.)
- Massive parallel throughput
- Enterprise identity management (LDAP, AD, Linux PAM)
- Unified Web console and reporting
- Ad Hoc Search, Query, and Analytics
- Ability to save searches for reuse: save as JSON or XML for import by Kibana and other applications
- Add any performance or capacity of servers to the cluster in 90 seconds; Swarm takes care of load balancing on the fly
- Seamless integration with Windows® file servers and NetApp® filers that provides transparent user and application data access
- Universal access via a single namespace for HTTP, NFS, S3, and Azure Blobs
- Direct HTTP streaming with range read support, ideal for long-tail content and OTT content delivery
- Continuous protection (node-failure-resistant design, self-healing with fast, proactive recovery and replication)
- Erasure coding and AES-256 encryption for data at rest
- 24×7 support and professional services
- Built-in compliance including:
- WORM: True WORM capability with no ability to get to the file because it’s a black box appliance that does not have a file system
- Integrity seals to protect data in transit
- Legal hold
- Optimized operations:
- 70% reduction in power usage: patented power down technology
- 75% reduction in storage TCO: many petabytes can be managed by less than one person
- 95% usable disk capacity
- 100% uptime with no single point of failure: proven 7/24/365 loads since 2005
Caringo’s “learn-by-doing” philosophy with the added benefit of time to work out any bugs has allowed their team to develop a very robust and comprehensive feature set in Swarm. Swarm is unique in that all nodes perform all functions, meaning you can add a new server to the cluster by simply hooking it up to the network and powering on the bare machine. New nodes boot over the network and are immediately added to the cluster in less than 90 seconds, and Swarm takes care of load balancing on the fly. Other implementations typically require the administrator to designate a specific role for each node, increasing complexity and points of failure. “One of our customers, NEP, a provider of the technology and expertise to enable organizations to produce the world’s biggest live and broadcast events, has been a Caringo customer since 2010,” stated Ring. NEP has designed and deployed a scalable, reliable, high-performance Content Distribution Network (CDN) in the Netherlands using Caringo Swarm. The NEP CDN serves more than 650,000 user accounts accessed with different consumption methods, ranging from monthly subscriptions to pay-per-view.
According to Gerbrand de Ridder, Head of R&D and Lead System Architect of NEP, The Netherlands, they needed a comprehensive platform that could provide scalability to its CDN and meet comprehensive requirements in terms of capacity, cost, throughput, growth and reliability, coupled with full hardware independence. Caringo Swarm was selected after a long evaluation process that included several vendors. “We chose Caringo Swarm because it was the most flexible platform for us and had good per terabyte licensing,” said Ridder. “We started with 100 TBs and now have over 1.25 PBs across 200 nodes from different hardware vendors. Caringo Swarm is a key component to deliver that high-end quality service that NEP has always aimed for.”
Today, Caringo Swarm is delivering up to 45Gb/sec of throughput, making it a key component to deliver the high-end quality video delivery service that NEP is known for. Continued Ridder, “We don’t have a full-time system administrator working on our Caringo Swarm object storage system. The only issue that usually happens to the system is that if a node or a disk fails, we only need to replace or repair it.” And because of Swarm’s built in self-healing capabilities, a node or disk failure is not a catastrophic event. The other nodes in the system will continue to function enabling uninterrupted operation. It’s for this reason that Ring can claim that there’s been no data loss in Caringo’s history.
Looking to the future, Ring sees a distinct move to hybrid cloud solutions across all industries, but especially in the M&E space. As such, Caringo is building seamless tiering of data to and from the cloud for elastic storage and computer requirements. In closing, Ring concluded, “We will continue to drive customer-centric innovation while further establishing our technological leadership by working closely with OEMs and our channel partners to develop hybrid cloud solutions to better serve our long-time and future customers.”
I couldn’t have said it better myself and I highly recommend that you check out their website at www.caringo.com.