Skip to Content.
Sympa Menu

per-entity - [Per-Entity] About SAMLbits.org

Subject: Per-Entity Metadata Working Group

List archive

[Per-Entity] About SAMLbits.org


Chronological Thread 
  • From: Chris Phillips <>
  • To: "" <>
  • Subject: [Per-Entity] About SAMLbits.org
  • Date: Wed, 20 Jul 2016 16:01:54 +0000
  • Accept-language: en-US

Hi..

On the call today I mentioned samlbits.org as a component in a SAML distribution architecture.
It's a content distribution network (CDN) architected to the purpose of SAML metadata and caching on a global scale.

An initiative led by Leif J and used by others, high level details can be seen here: http://samlbits.org

I didn’t realize how much I wrote but I did chunk it up a bit ;)

Chris.



How it works:
  • There are physical nodes/caches on various continents and locations globally.
  • The SAMLbits cache servers are configured to mirror an HTTP url and then cache and redistribute it.
  • Content is still hosted on a federation's server and SAMLbits just mirrors it
    •  Samlbits is able to be more clever about cache duration and changes etc.
How does this help Tom's signing challenge?

If we use the example of how inCommon has been signing things offline and then publishing them, SAMLbits is just a redistribution of  these files.

The function of the MDQ server is lessened and the simple serving of files requests (with the right mime type) become more prominent.
You can point samlbits at an MDQ server that is under higher controls than that of AWS for instance, but at the risk of not getting the functionality it offers beyond that of an apache server configured in the right manner.

If my SP only refers to SAMLbits for retrieving metadata and not my origin server, the idea is that the cache can respond and in turn not hammer the origin server 

It also means that the signing key resides in it's current safe location without changing things.

The trade off is that signing MDQ content is much less dynamic, the question being: is this loss of functionality ok?

What do I lose, there no such thing as a free lunch right?

It's pretty good but we have not prescribed it for our production metadata distribution yet.  
This is a big step injecting another cog in the machine and something we do not control which is the biggest obstacle — would one delegate metadata availability to a 3rd party service?
(the trade is that you get higher availability with less effort and cost) 

To answer the above, here are things we have an interest in knowing more about the behaviour of the CDN:

Cache refresh
There is this notion of caching and then living by the cache cycle (or valid until date in metadata).  How frequently (or not due to being a true cache) does a file get refreshed will play into distribution.  I think of this as an 'eventual consistency' situation.  When there are multiple tiers of caching (SP/IdP, distribution points, etc) and different cache policies (1hr retrieval, retrieve from disk and statically updated), it gets a bit difficult to forecast WHEN an update is fully propagated.  Leif would be able to weigh in on this more.

Logging of requests.
If you have a known location to retrieve files, you can enumerate who retrieves files and infer (or in the case of HEANET who use Jagger and a special url for each IdP/SP) be very precise about who retrieved what file when.  This mitigates the concerns about 'Did service X see update Y for IdP Z and is it still a problem?'. You can better triangulate things if you know retrieval stats (Ips from AWS aside, you can somewhat guess who's fetching things).

One feature that Leif built into the architecture at our request was the ability to syslog the access to the cache for a given endpoint.  You will note that CAF is listed as a configured aggregate. When someone accesses a URL there we have a UDP syslog endpoint that tracks access to the aggregate.  This at least allows equal (no better/no worse) info about who's retrieving the aggregate.  This is part of the configuration of samlbits for an endpoint.

What this means is that when I retrieve this file:

It's as if I retrieve this file:

And can see a record of access --  here's an example:
Jul 20 07:52:32 10.189.34.2 Jul 20 11:52:11 nl-surf-utr-1 VARNISH: 178.63.86.11 - - [20/Jul/2016:11:52:10 +0000] "GET http://caf.cdn.samlbits.net/CoreServices/test/caf_metadata_signed_sha256.xml HTTP/1.0" 200 0 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)"

I don't know how realtime things are as there is a log buffer to transmit so low traffic may take awhile for logs to transmitted.

About TLS
Aggregates are signed and the trust is in the signature validation NOT the file retrieval location or TLS securing of the endpoints.  There is nothing secret in the SAML aggregate be it a single entity or aggregate of thousands.
I'm sure there's some discussion to be had here given Tom's comments in the presentation today..

Sustainability

As Scott K called out today, sustainability is an interesting question on samlbits.org.  It's been around for a few years now and expect it to be in the future but I think it would be a conversation to raise with Leif J about the roadmap and operational aspects.  
It's all open source so there's no limits about being able to take things over if you wanted to run your own (or leave it).  One aspect that Leif and I talked about is that an additional North American node would be beneficial.  The requirement is to have 1 U rack, power, internet, and infrequent access to fingers on keyboard in case there's a reachability issue for remote maintenance.  There is an existing contract outlining this for anyone who wants to run a node.  




How does this apply to per entity metadata?

  • It's a tool to help circulate metadata aggregates of any size
    •  entities of 1 or aggregates with thousands are all treated the same and cached.  
  • It helps with how to manage keys by allowing the signing key live in a controlled environment less exposed than out in AWS such as tom was talking about 
    • (BTW Azure has signing capability too — but charges by transactions for it ;))
  • Does it replace an MDQ server?
    • I don't think so, but it can cache any and all results of the server ;)
Hope this helps and welcome questions and opinions as always..


Chris.




  • [Per-Entity] About SAMLbits.org, Chris Phillips, 07/20/2016

Archive powered by MHonArc 2.6.19.

Top of Page