Skip to Content.
Sympa Menu

per-entity - Re: [Per-Entity] Some thoughts about availability and scalability

Subject: Per-Entity Metadata Working Group

List archive

Re: [Per-Entity] Some thoughts about availability and scalability


Chronological Thread 
  • From: "Domingues, Michael D" <>
  • To: "Cantor, Scott" <>, Tom Scavo <>, Chris Phillips <>
  • Cc: Per-Entity Metadata Working Group <>
  • Subject: Re: [Per-Entity] Some thoughts about availability and scalability
  • Date: Tue, 2 Aug 2016 16:58:32 +0000
  • Accept-language: en-US
  • Authentication-results: spf=none (sender IP is ) ;
  • Spamdiagnosticmetadata: NSPM
  • Spamdiagnosticoutput: 1:99

> There are pretty much millions of lines of code written that basically
> assume a DNS server will never fail. And when they do, it all just breaks.

> It's not about whether something can fail, it's about the expense of
> working around it and the economic trade-off of asking everybody to code
> a workaround. The question is to what degree a standard "serve static
> documents" web use case has reached this level of commoditization
> and "assumed robustness".

I agree. I'd argue that serving static documents is (for most cases) a solved
problem, and that this is something I expect InCommon, or any other
organization, to be capable of in a highly-available manner, where anytime
downtime means "something has gone seriously wrong". Especially with the
offerings on Azure or AWS, globally available DNS, locating mirrored servers
in redundant regions/zones, etcetera, the tools (and reliability) of these
solutions are getting more accessible by the day.

Chris's point about SAMLbits playing the role of this globally available
access layer is intriguing. Shall we pull in Leif (if he isn't here already)
to talk about it in a future call?

That said, I think there is a difference (perhaps only in perspective) of
coding a workaround versus coding in redundancy.

> I'm not claiming they are the same (that's an objective matter, and if we
> have constant minutes-long downtime, then no, they're not the same), but I
> am arguing they should be.

+1. Both services (in my book, and likely that of most federation
participants) ought to be designed and operated with as close to no downtime
as possible (four nines or what you will) as the goal.


Archive powered by MHonArc 2.6.19.

Top of Page