18 May, 1998
Creating Scalable Sites Using Homegrown Tools
By Arik Hesseldahl

If there's one thing the operations staff of WhoWhere Inc. knows, it's how to break things.

"When we first started doing this, we tried to use third-party tools," Kapoor said. "They worked pretty well in the early stages, but we found that a solution that is applicable to a site with 1 million users is not going to work as well for a site with 10 million."

The challenge of managing three terabytes of data has led WhoWhere's technical managers — Rupesh Kapoor, chief scientist and a cofounder, and Sean White, CTO and vice president of technology —to scrap some off-the-shelf products in favor of more robust, custom-built technology.

Although the company would not disclose how many hits the sites receive, Kapoor said the WhoWhere sites had visits by 7.6 million unique users in April.

That volume has often forced the staff at WhoWhere to modify the third-party products it buys with its own homebrewed programming, which is specifically geared to its traffic patterns.

"At one point we were seeing a crash on one of our servers every three or four days," Kapoor said. "We went back to the vendors, and they told us that given the number of servers we had, statistically that is what we should expect."

The number of users and the scale of Who Where's infrastructure are greater than most vendors are prepared to handle, White added.

"I would rather be able to go to a vendor and buy the solution we need," he said. "But right now, every time we mention [the number of users] they look like deer in the headlights, because they have a completely different model for the number of users their products are developed for. Scalability has become a new game."

BRANCHING OUT

Launched in 1995 as a phone and email directory search engine, Who Where has expanded its offerings in the last year to include Mail-City, a free Web-based e-mail site, and Angelfire, a free personal Web page site, which the company acquired last October.

Who Where has also extended its operations to include private-label versions of its e-mail and home page products to other sites. Its partners now number more than 20, and include Excite, Deja News, Qualcomm's Eudora division, Net-Noir, iVillage, The Globe, and ZD-Net. The net result of Who Where's branching out is that the company has seen traffic increase exponentially in recent months.

The growth initially put a strain on the site's 40 Sun Enterprise Ultra servers, which include a mix of Ultra 2s through Ultra 4000s. To better manage the load, as one partner site grows, its content may be shifted from one server to another or split across several servers, Kapoor said.

Load-balancing the traffic was another challenge for which WhoWhere found commercial load-balancing products unsuitable-so the company again implemented a homebrewed solution.

"One of the things you have to do with Web-based mail is constantly maintain a session," Kapoor said. "The standard load-balancing schemes we get from Cisco and other vendors don't work so well, so we've had to develop our own scheme. It seems like we will never be totally out of the technology business."

The WhoWhere sites are equipped with 150 Mbps of Internet bandwidth, and are hosted in three colocation facilities: with Exodus Communications in Santa Clara, Calif., TCG CERFnet in Palo Alto, Calif., and Frontier GlobalCenter in Herndon, Va. WhoWhere's Mountain View, Calif., headquarters are connected to the Internet through a T-1 line (1.5 Mbps) to Exodus, and via two T-1s to Uunet Technologies.

Average bandwidth utilization has been running at about 45 Mbps, but Kapoor said that's growing 20 percent each month as WhoWhere adds partners-each one of which may bring in between 100,000 to 1 million new users.

UNUSUAL BANDWIDTH PATTERNS

Providing three different Internet services-directory, e-mail, and personal Web pages-presents some unique bandwidth problems for WhoWhere. Most sites simply handle requests to serve up their bandwidth, requiring more upstream than downstream bandwidth. WhoWhere handles millions of e-mail messages a day and logs a daily average of 150 gigabytes of data.

"Our bandwidth needs are really bidirectional. We have to be able to receive a lot of incoming data, which does not apply to most other sites," said Kapoor. "We have a lot of reading and writing going on."

That makes reliable connections a must. To that end, WhoWhere constantly monitors the quality of its connections from its ISPs. WhoWhere works with an outside contractor that monitors the site continuously from 49 different locations around the world. If response times in one area are slow, WhoWhere can then work with its ISPs to install additional lines or to work out new peering agreements with other ISPs.

The company also watches its usage patterns carefully. White said that although there are many registered WhoWhere users, the amount of data written to the WhoWhere directory is low, so servers are configured mainly for sending data out. By contrast, email involves much more incoming traffic, and the data flow on the Angelfire site tends to run about 50-50, as users upload images and other files to their sites.

In recent months WhoWhere has mounted a concerted effort to ensure that no single point of failure can bring its sites down. This has meant installing redundant Cisco routers, as well as the construction of a dual local area network.

The redundant elements also help WhoWhere handle traffic spikes, which occur for two main reasons, Kapoor said. In one case, a partner site with many members may announce the availability of free e-mail or personal pages. One such case has been with The Globe, an online community with more than a million members. A recent television ad campaign for the site has spurred a sudden growth in membership.

Sudden jumps in activity also occur when a partner site does what is called batch registration, in which all members of the community are automatically registered for free email once it becomes available.

"We've been learning over time to see what happens when these events occur, and working with our partners so that we can be prepared for them," Kapoor said.

Sidebar: At A Glance Company: WhoWhere Inc.
Headquarters: Mountain View, Calif.
Web sites: WhoWhere, a directory site; MailCity, a free Web e-mail site; and Angelfire, a free Web page site
Bandwidth: 150 Mbps total connectivity in three colocation facilities: Exodus Communications in Santa Clara, Calif.; TCG CERFnet in Palo Alto, Calif; Frontier GlobalCenter in Herodon, Va.
Users: 7.6 million unique visitors to WhoWhere sites during April 1998
Hits per day: Not disclosed
Average bandwidth utilization: 45 Mbps
Servers: 40 Sun Enterprise Ultra servers that include a mix ranging from Ultra 2s through Ultra 4000s running Solaris 2.6, and 10 Pentium-based servers running Linux
Load balancing: Proprietary application designed in-house

< Back