Clockwork server clocks
You’d think that synchronizing the clocks across a fleet of modern servers is a solved problem, but it’s actually quite a hard challenge to solve, especially if you want to get to nanosecond accuracy. This also means that it remains an axiom in computer science that you should never build a system based on clock time. Clockwork.io, which is announcing a $21 million Series A funding round today, promises to change this with sync accuracy as low as 5 nanoseconds with hardware timestamps and hundreds of nanoseconds with software timestamps.
Based on this work, the company is also launching its first product today, Latency Sensei, which can give its users extremely fine-grained latency data in their cloud, on-premises, and hybrid environments, which they can then use to find bottlenecks and tune their networks. The company’s customers already include the likes of Nasdaq, Wells Fargo, and RBC.
The startup was founded by Yilong Geng, Deepak Merugu, and Stanford’s “VMware Founders Professor of Computer Science” Balaji Prabhakar, with VMware co-founder and Stanford computer science professor Mendel Rosenblum serving as a board member and chief scientist. Given this group’s pedigree, it’s no surprise that the core research behind Clockwork’s system is based on fundamental academic research the team did at Stanford.
The Network Time Synchronization Protocol (NTP), which is the standard format that most computers use for synching clocks today, is ubiquitous but not very accurate. There has been some work on improving that, with Facebook, for example, contributing a hardware solution to the Open Compute Project last year, but the Clockwork team promises far greater accuracy.
“Sometimes, inside data centers, I couldn’t get them to agree on a second. My phone and the base station here probably agree on the second. Then you get finer and finer and finer — down to the microseconds and nanoseconds. That is very hard. It’s very hard for two clocks to know exactly what nanosecond they are in,” Prabhakar explained. He noted that it’s also not good enough to synchronize these clocks once. You also have to keep them in sync. You can put high-accuracy clocks that are immune to temperature variations and vibration into a server, but that clock would quickly become more expensive than the server itself.
To solve this issue, the team built a system and machine learning model that allows it to very accurately measure the time it takes for a timestamp to arrive at a given server. That’s not so different from how NTP works, but the team then takes this a few steps further by looking at a variety of timestamps and then getting both the offset of the clock and the relative frequency difference. All of this then feeds into the machine learning model. In addition, the team also built the system so the different clocks can talk to each other and detect (and correct) when they are not synchronized.
In the absence of trustworthy timestamps, distributed systems have long had to rely on clockless designs, which adds an extra level of complexity to building complex systems. The Clockwork team hopes that its work will allow researchers to experiment with new time-based algorithms across a number of problem areas like database consistency, event ordering, consensus protocols and ledgers.
Clockwork server clocks