High Scalability
Hot Scalability Links For Sep 3, 2010
With summer almost gone, it's time to fall into some good links...
- Hibari - distributed, fault tolerant, highly available key-value store written in Erlang. In this video Scott Lystig Fritchie gives a very good overview of the newest key-value store.
- Tweets of Gold
- lenidot: with 12 staff, @tumblr serves 1.5billion pageviews/month and 25,000 signups/day. Now that's scalability!
- jmtan24: Funny that whenever a high scalability article comes out, it always mention the shared nothing approach
- mfeathers: When life gives you lemons, you can have decades-long conquest to convert lemons to oranges, or you can make lemonade.
- OyvindIsene: Met an old man with mustache today, he had no opinion on #noSQL. Note to myself: Don't grow a mustache, now or later.
- vlad003: Isn't it interesting how P2P distributes data while Cloud Computing centralizes it? And they're both said to be the future.
- You may be interested in a new DevOps Meetup organized by Dave Nielson, so you know it will be good.
Six guiding principles to Consolidate your IT
The need for IT consolidation is most evident in two types of organizations. In the first group, IT grew organically with business over the decades, and survived changes of strategy, management, staff and vendor orientation. The second group of businesses capital groups are characterized by rapid growth through acquisitions (followed by attempts to integrate radically different IT environments). In both groups, their IT infrastructures have typically been pieced together over the past 20 (or more) years.
Read more on BigDataMatters.com
Distributed Hashing Algorithms by Example: Consistent Hashing
Consistent Hashing is a specific implementation of hashing that is well suited for many of today’s web-scale load balancing problems. Specifically, it can be seen in use in various caching solutions like Memcached and is applicable to NoSQL solutions as well. Consistent Hashing is used particularly because it provides a solution for the typical “hashcode mod n” method of distributing keys across a series of servers. It does this by allowing servers to be added or removed without significantly upsetting the distribution of keys, nor does it require that all keys be rehashed to accommodate the change in the number of servers.
You can read the full store here.
Scale-out vs Scale-up
In this post I'll cover the difference between multi-core concurrency that is often referred to as Scale-Up and distributed computing that is often referred to as Scale-Out mode.
more..
Source: Scale-out vs Scale-up (http://www.dzone.com/links/r/scaleout_vs_scaleup.html) by Nati Shalom
Paper: The Case for Determinism in Database Systems
Can you have your ACID cake and eat your distributed database too? Yes explains Daniel Abadi, Assistant Professor of Computer Science at Yale University, in an epic post, The problems with ACID, and how to fix them without going NoSQL, coauthored with Alexander Thomson, on their paper The Case for Determinism in Database Systems. We've already seen VoltDB offer the best of both worlds, this sounds like a completely different approach.
The solution, they propose, is:
Pomegranate - Storing Billions and Billions of Tiny Little Files
Pomegranate is a novel distributed file system built over distributed tabular storage that acts an awful lot like a NoSQL system. It's targeted at increasing the performance of tiny object access in order to support applications like online photo and micro-blog services, which require high concurrency, high throughput, and low latency. Their tests seem to indicate it works:
We have demonstrate that file system over tabular storage performs well for highly concurrent access. In our test cluster, we observed linearly increased more than 100,000 aggregate read and write requests served per second (RPS).
Rather than sitting atop the file system like almost every other K-V store, Pomegranate is baked into file system. The idea is that the file system API is common to every platform so it wouldn't require a separate API to use. Every application could use it out of the box.
The features of Pomegranate are:
- It handles billions of small files efficiently, even in one directory;
- It provide separate and scalable caching layer, which can be snapshot-able;
- The storage layer uses log structured store to absorb small file writes to utilize the disk bandwidth;
- Build a global namespace for both small files and large files;
- Columnar storage to exploit temporal and spatial locality;
- Distributed extendible hash to index metadata;
- Snapshot-able and reconfigurable caching to increase parallelism and tolerant failures;
- Pomegranate should be the first file system that is built over tabular storage, and the building experience should be worthy for file system community.
Can Ma, who leads the research on Pomegranate, was kind enough to agree to a short interview.
OpenStack - The Answer to: How do We Compete with Amazon?
The Silicon Valley Cloud Computing Group had a meetup Wednesday on OpenStack, whose tag line is the open source, open standards cloud. I was shocked at the large turnout. 287 people registered and it looked like a large percentage of them actually showed up. I wonder, was it the gourmet pizza, the free t-shirts, or are people really that interested in OpenStack? And if they are really interested, why are they that interested? On the surface an open cloud doesn't seem all that sexy a topic, but with contributions from NASA, from Rackspace, and from a very avid user community, a lot of interest there seems to be.
The brief intro blurb to OpenStack is:
21 Quality Screencasts on Scaling Rails
This a follow-up post to an earlier post on the Scaling Rails Screencast Series by Gregg Pollack, when there were only 13 screencasts, now there are 21. Eight more have been added on topics like load testing and database scaling. This series is of surprisingly high quality. While I didn't view every screencast, I sampled a large set and found them to have solid content and high production values. In fact, how did they make these things? The instructor moves around in a little box while the content flows around him. A very cool effect. But that wouldn't matter if the content didn't deliver, here's what's new:
Sponsored Post: deviantART, Okta, EzRez, Cloud Sigma, ManageEngine, Site24x7
- deviantART is Hiring a Senior Software Engineer.
- Okta is hiring! Okta provides a ground-breaking cloud adoption and management solution and they are looking for people in many different areas.
- ezRez is a B2B SaaS provider looking to hire a Senior Software Engineer and Software Engineer
- Cloud Sigma. Instantly scalable European cloud servers.
- ManageEngine Applications Manager : Application Performance monitoring and Virtualization monitoring.
- www.site24x7.com : Website Monitoring Service from a global monitoring network.
Building a Scalable Key-Value Database: Project Hydracus
6 Ways to Kill Your Servers - Learning How to Scale the Hard Way
This is a guest post by Steffen Konerow, author of the High Performance Blog.
Learning how to scale isn’t easy without any prior experience. Nowadays you have plenty of websites like highscalability.com to get some inspiration, but unfortunately there is no solution that fits all websites and needs. You still have to think on your own to find a concept that works for your requirements. So did I.
A few years ago, my bosses came to me and said “We’ve got a new project for you. It’s the relaunch of a website that has already 1 million users a month. You have to build the website and make sure we’ll be able to grow afterwards”. I was already an experienced coder, but not in these dimensions, so I had to start learning how to scale – the hard way.
Hot Scalability Links For Aug 20, 2010
Lots of good links this week...
- Membase, powering Farmville's 500k operations *per second*. Of course, some people contend they could do this on their old Vic-20, but this is a useful, vigorous discussion thread on Reddit.
- Tweets of Gold:
- kbsingh: I dont understand why some developers think its ok to leave operations people out of scalability decisions
- karmazilla: I find it a little odd when a database claims to support "massive scalability" when it is not distributed.
- pcapr: OH: teenagers are eventually consistent
- tv: Verb suggestion for the act of mapreducing data: "marinating". "Then we marinade it to get the n-gram frequencies."
Misco: A MapReduce Framework for Mobile Systems - Start of the Ambient Cloud?
Misco: A MapReduce Framework for Mobile Systems is a very exciting paper to me because it's really one of the first explorations of some of the ideas in Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud. What they are trying to do is efficiently distribute work across a set cellphones using a now familiar MapReduce interface. Usually we think of MapReduce as working across large data center hosted clusters. Here, the cluster nodes are cellphones not contained in any data center, but compute nodes potentially distributed everywhere.
I talked briefly with Adam Dou, one of the paper's authors, and he said they don't see cellphone clusters replacing dedicated computer clusters, primarily because of the power required for both network communication and the map-reduce computations. Large multi-terabyte jobs aren't in the cards...yet. Adam estimates computationally that cellphones are performing similarly to desktops of ten years ago. Instead, they want to focus on the unique characteristics of the mobile devices--camera, microphone, GPS and other directly collectable data--so the data can be processed where collected.
MapReduce was selected as the programming interface because it is familiar to programmers, it transparently supports programming multiple devices, and can be implemented--especially using Python---in such a way that programmers are freed from all the underlying details like concurrency, data distribution, and code management. A very smart move in my estimation.
It's interesting to contrast the economics of the ambient cloud to the economics of the data center cloud. The goal of a data center cloud is 100 percent utilization. Use every possible CPU cycle or money is being wasted money on unused equipment. In an ambient cloud the idea is more parasitic, deploy to more resources yet leave the primary function of the device unaffected. It's a different perspective that may lead to different architectures.
A quick introduction to Misco from the abstract:
Scaling an AWS infrastructure - Tools and Patterns
This is a guest post by Frédéric Faure (architect at Ysance), you can follow him on twitter.
How do you scale an AWS (Amazon Web Services) infrastructure? This article will give you a detailed reply in two parts: the tools you can use to make the most of Amazon’s dynamic approach, and the architectural model you should adopt for a scalable infrastructure.
I base my report on my experience gained in several AWS production projects in casual gaming (Facebook), e-commerce infrastructures and within the mainstream GIS (Geographic Information System). It’s true that my experience in gaming (IsCool, The Game) is currently the most representative in terms of scalability, due to the number of users (over 800 thousand DAU – daily active users – at peak usage and over 20 million page views every day), however my experiences in e-commerce and GIS (currently underway) provide a different view of scalability, taking into account the various problems of availability and data management. I will therefore attempt to provide a detailed overview of the factors to take into account in order to optimise the dynamic nature of an infrastructure constructed in a Cloud Computing environment, and in this case, in the AWS environment.
Hot Scalability Links for Aug 13, 2010
- Ezra Zygmuntowicz in a heart warming account of his 4 Years at Engine Yard, has concluded in his experience that: the true future of cloud computing for developers is to not think about servers at all. It is now time to focus on the Application and new levels of abstraction that allow folks to use the computing resources in easier and easier ways.
- Tweets of Gold:
- bryanlatten: Nothing like a million caching layers to screw up an already complicated deployment. Thankfully, there is beer.
- jkalucki: Twitter isn't down, you are just using the wrong access methods...
- andyedinborough: I don't mean to hate, but why would I give up performance and scalability for a dynamic language? Honestly, I don't get it.
- AsitSinha: It's amazing.... to see the absence of an understanding of how capability plays a role in scalability.
Designing Web Applications for Scalability
I can’t even count the number of times that I’ve heard this phrase: “don’t worry about scaling your web application, worry about visitor (or customer) acquisition.” My response to this is always that you don’t need to choose one or the other, you can do both! In this post, I’m going to go over some of the strategies I’ve used to architect web applications for scalability, right from the start of the design process, in such a way that I’m prepared to scale when I need to, but not forced into doing so before its necessary. Easing the transition from small scale to large scale can be made much easier by choosing the right technologies and implementing the right coding patterns up front.
You can read the full store here.
Strategy: Terminate SSL Connections in Hardware and Reduce Server Count by 40%
This is an interesting tidbit from near the end of the Packet Pushers podcast Show 15 – Saving the Web With Dinky Putt Putt Firewalls. The conversation was about how SSL connections need to terminate before they can be processed by a WAF (Web Application Firewall), which inspects HTTP for security problems like SQL injection and cross-site scripting exploits. Much was made that if programmers did their job better these appliances wouldn't be necessary, but I digress.
To terminate SSL most shops run SSL connections into Intel based Linux boxes running Apache. This setup is convenient for developers, but it's not optimized for SSL, so it's slow and costly. Much of the capacity of these servers are unnecessarily consumed processing SSL.
Think of Latency as a Pseudo-permanent Network Partition
The title of this post is a quote from Ilya Grigorik's post Weak Consistency and CAP Implications. Besides the article being excellent, I thought this idea had something to add to the great NoSQL versus RDBMS debate, where Mike Stonebraker makes the argument that network partitions are rare so designing eventually consistent systems for such rare occurrence is not worth losing ACID semantics over. Even if network partitions are rare, latency between datacenters is not rare, so the game is still on.
The rare-partition argument seems to flow from a centralized-distributed view of systems. Such systems are scale-out in that they grow by adding distributed nodes, but the nodes generally do not cross datacenter boundaries. The assumption is the network is fast enough that distributed operations are roughly homogenous between nodes.
Sponsored Post: Okta, EzRez, VoltDB, Digg, Cloud Sigma, Applications Manager, Site24x7
- Okta is hiring! Okta provides a ground-breaking cloud adoption and management solution and they are looking for people in many different areas.
- ezRez is a B2B SaaS provider looking to hire a Senior Software Engineer and Software Engineer
- Cloud Sigma. Instantly scalable European cloud servers.
- ManageEngine Applications Manager. ManageEngine provides Enterprise IT Management suite of products.
- Site24x7. Easy, fast and effective web server monitoring, server monitoring and website monitoring service.
We're building a key service for the cloud, in the cloud, by people who know the cloud. Our team is composed of people who were central to building services likes Salesforce.com and SuccessFactors, systems which process millions of transactions in the cloud every day. We know what problems people are having because we experienced them ourselves and saw our customers and colleagues cry for help. We're changing the way people interact with technology, starting with a very fundamental element: identity.
We've got exciting, hard problems to solve and we want you to help us. We learned a lot while creating the largest on-demand enterprise companies, and we're putting that knowledge to good use as we build the next generation of corporate IT. At Okta we understand what Internet-scale innovation requires, which is why we've started fresh, with no legacy code or old code lines to maintain. It's a fast-paced, agile environment – just like the Internet – and we need the best and the brightest to help us change the world.
For more information see Okta's Careers page.
VoltDB Field/Community EngineerVoltDB is attracting more and more users every day. If you have a strong technical background in SQL and Linux, are experienced with production database deployments, and have a passion for customers and community, you could be just the person we are looking for. Are you excited about the prospect of working with users to develop and deploy VoltDB applications, and about helping users participate in the thriving VoltDB community? If so, read on at their job page.
Get Your High Scalability Fix at DiggInterested in working on cutting-edge high-scale infrastructure at Digg? We're making a big investment in scaling and have committed to the NoSQL (Not only SQL) path with Cassandra. We're using other open-source infrastructure to help us scale including Hadoop, RabbitMQ, Zookeeper, Thrift, HDFS and Lucene. We're rewriting Digg from the ground up and we need amazing developers to join our world-class team. If you think you are up for the challenge, or you know someone who might be, take a look at our jobs page for more information.
CloudSigma
- Instantly Scalable European Cloud Servers. Create virtual servers in the cloud that are fully scalable and adaptive. Control your servers via our web console or API. CloudSigma gives more power and control over your server infrastructure.
- Keep control, increase scalability. Subscribe for capacity or pay as you go; with CloudSigma we give you the power and control you need.
- Competitive Innovative Pricing. Discover transparent pricing and a flexible billing model. Purchase what you need when you need it without resource bundling. We let you purchase CPU, RAM, Storage and bandwidth independently. Create your perfect combination that’s right for you.
- 14-day Free Trial. Try our cloud computing products free.
More information at CloudSigma.
ManageEngine Applications ManagerManageEngine provides Enterprise IT Management suite of products. ManageEngine Applications Manager helps SaaS companies monitor their production applications and helps keep costs low.
There is out-of-the-box support for monitoring application servers, database servers, servers and web servers from a single web console. In addition to support for IBM Applications, Oracle Apps and Microsoft applications, there is deep support for Open Source Applications like JBoss, Memcached, LAMP stack etc. Pricing starts at $795 for monitoring 25 servers or applications. Learn more about the Application Performance Monitoring tool.
Site24x7.com (from ZOHO) is a Website and Web Application Monitoring service. It helps you ensure your shopping carts and other web transactions work. It also helps you monitor the performance of your websites from a global point of presence. You can Sign Up for a Free Trial. The Professional Edition starts at $1 / Month. Learn more about the Website Monitoring Service.
ezRez Senior Software EngineerYou will be part of a fun, fast-paced and highly collaborative engineering team leveraging agile methodologies to deliver new functionality in incremental iterations. You will get exposed to cutting-edge technologies including an open source stack, dependency injection (DI) frameworks, ORM, XML web services, distributed grid-based caching and the latest UI technologies. We're also currently using best-of-breed extreme programming (XP) techniques including test driven development, pair programming, continuous refactoring and continuous integration.
Requirements- Familiarity with agile software development and a desire to champion it among the team.
- Ability to work independently, multitask and manage time effectively.
- Experience with test-driven development techniques and/or well-disciplined to write unit tests that assert something useful.
- Excellent communication skills (verbal, written, wiki, and white-boarding).
- 5+ years experience with Java (or other OO languages like C++, Smalltalk, Ruby, Python) with deep understanding of object-oriented design.
- Passion for technology outside the workplace with an interest in the latest open source framework/libraries/tools including Spring, Hibernate, concurrency, memcached/key-value repositories, Freemarker, Tomcat, subversion, maven, and Ruby on Rails.
- Familiarity with the travel industry or a high-transaction ecommerce web-site.
- Experience with a hosted, multi-tenant application environment.
- We're global, so a familiarity with internationalization and configurable displays is a plus.
- Experience with OWASP secure coding guidelines.
Please email your resume (as an attachment) to employment@ezrez.com with the subject "Sr. Software Engineer" for immediate consideration. You may also contact our recruiter via Skype at JanetBourland. Please note that at this time we are not considering candidates that will require employer sponsorship to work in the United States. No calls from recruiters/agencies please. ezRez Software is an equal opportunity employer.
ezRez Software Engineer You will be part of a fun, fast-paced and highly collaborative engineering team leveraging agile methodologies to deliver new functionality in incremental iterations. You will get exposed to cutting-edge technologies including an open source stack, dependency injection (DI) frameworks, ORM, XML web services, distributed grid-based caching and the latest UI technologies. We're also currently using best-of-breed extreme programming (XP) techniques including test driven development, pair programming, continuous refactoring and continuous integration. Requirements- Ability to work independently, multitask and manage time effectively.
- Some exposure to agile software development or a desire to practice it.
- 1-3 years experience with Java (or other OO languages like C++, Smalltalk, Ruby, Python) with a firm grasp of object-oriented design
- Experience with test-driven development techniques and/or well-disciplined to write unit tests that assert something useful.
- Excellent communication skills (verbal, written, wiki, and white-boarding).
- Passion for technology outside the workplace with an interest in the latest open source framework/libraries/tools including Spring, Hibernate, concurrency, memcached/key-value repositories, Freemarker, Tomcat, subversion, maven, and Ruby on Rails.
- Familiarity with the travel industry or a high-transaction ecommerce web-site.
- Experience with a hosted, multi-tenant application environment.
- We're global, so a familiarity with internationalization and configurable displays is a plus.
- Experience with OWASP secure coding guidelines.
Please email your resume (as an attachment) to employment@ezrez.com with the subject "Software Engineer" for immediate consideration. You may also contact our recruiter via Skype at JanetBourland. Please note that at this time we are not considering candidates that will require employer sponsorship to work in the United States. No calls from recruiters/agencies please. ezRez Software is an equal opportunity employer.
If you are interested in a sponsored post for an event, job, or product, please take a look at the advertising section.
NoSQL on the Microsoft Platform
NoSQL is a trend that is gaining steam primarily in the world of Open Source. There are numerous NoSQL solutions available for all levels of complexity: from queryable distributed solutions like MongoDB to simpler distributed key-value storage solutions like Cassandra. Then there’s Riak, Tokyo Cabinet, Voldemort, CouchDB, and Redis. However, very few of these packaged NoSQL products are available for the other end of the platform market: Microsoft Windows. I’m going to outline what’s available now and briefly touch on some opportunities that are still available to the daring Microsoft engineer.
You can read the full story here.
- Delivery FAIL
- Not What You Meant FAIL
- Haha -- Thats What You Get For Bragging!: Facebook Inflection Fail
- Park Name FAIL
- Probably Bad News: Senseless FAIL
- Just Like Anybody Else: How Batman Spends A Night Off
- Who Cuted the Cheeses? Photo Booth Cutes up Your Mug
- Oh Yeah, Waste That Time: Treadmillasaurus Rex, The Dinosaur On A Treadmill Game
- Friday Rewind: Segway FAIL
- Around the Interwebs
- Zuckerberg, Is That You?: Hot Facebook Ride
- These Beats Are So Fresh!: 8-Bit Mixtape
- Wanted FAIL
- Nobody Reads Anyway: How To Sell Books
- iPod FAIL
- Double Rainbow Guy In Windows Commercial
- Portable Cities: Suitcase Architecture Made from Clothing
- For the Birds: 15 Awesome Avian Home Designs
- Diving in
- All Day Breakfast FAIL
- Facebook Bug Prevents Users From Commenting And Liking
- BREAKING: Facebook Now Displaying All Liked News Articles In Search Results
- Maps Without Maps Shows Off Google’s Styles
- 10 Little Known Facts About San Diego Facebook Users
- How the top U.S. airlines use Facebook
- Introducing Our Facebook Marketing Dictionary
- No Nuclear Option with Yahoo APIs
- Hot Scalability Links For Sep 3, 2010
- Six guiding principles to Consolidate your IT
- Twitter API Changes Causing Some User Headaches
- Quick Apps With ClickOnce
- "Dark Silicon" to Improve Smartphone Battery Life
- NSF Announces Future Internet Architecture Awards
- IBM Claims World's Fastest Microprocessor
- ERP Data On Mobile Made Easy
- REMINDER: Free Catfish Screening Today In San Francisco
- Blippy Opens Up Social Spending with New API
- HUGE: Facebook Testing New “Subscribe To” User Feature
- Facebook Now Testing Lightbox Gallery Display From Feed
- Surprising Facts About Death On Facebook [Infographic]