Skip to Content.
Sympa Menu

metadata-support - Re: [Metadata-Support] Re: optimizing metadata refresh

Subject: InCommon metadata support

List archive

Re: [Metadata-Support] Re: optimizing metadata refresh


Chronological Thread 
  • From: Ian Young <>
  • To:
  • Cc: Tom Scavo <>
  • Subject: Re: [Metadata-Support] Re: optimizing metadata refresh
  • Date: Tue, 24 Mar 2015 15:55:45 +0000


> On 24 Mar 2015, at 15:24, Tom Scavo
> <>
> wrote:
>
> The aggregate changes almost every day whereas an entity descriptor can go
> for months without change. My question is: Will the implementation
> take advantage of this, and if so, how?

The current implementation does not. The architecture is such that it could
be extended to do so if it seems sufficiently valuable. The following is
probably too detailed an explanation of this...

The central component involved is called an ItemCollectionLibrary, which has
the job of establishing what the current answer to the question "what
corresponds to identifier X" means in terms of a collection of SAML entities
(an IdentifiedItemCollection). It's configured with an MDA pipeline so that
it can gather metadata from different places and transform it arbitrarily
before indexing and "putting it in the library".

A separate component called a MetadataService has the job of rendering
IdentifiedItemCollections into Response objects which can be returned to the
client. Rendering is again configured using an MDA pipeline, and again
involves arbitrary transformations as well as applying a signature. The
MetadataService retains rendered results so that it doesn't need to render
things again every time. It only re-renders a response when it is observes
that the IdentifiedItemCollection it gets back from the ItemCollectionLibrary
has a different "generation" value than the one it stored the rendered form
of. So whether a conditional GET will succeed is ultimately delegated to the
ItemCollectionLibrary.

At the moment, the ItemCollectionLibrary makes a new generation of results
every time it runs the collection pipeline (at present, once an hour). It
doesn't compare the old generation of indexed results with the new one at
all, so this will result in re-rendering of results even if they haven't
"really" changed. It *could* be more selective, though, with the aim of only
forcing a new generation for those queries whose results have actually
changed. That's a fair amount of work, but because the ItemCollectionLibrary
does all of this work on a separate thread, it doesn't need to impact query
times at all. It could also be done on-demand to trade speed against memory
use.

There are some other wrinkles (e.g.: how do you compare one Set<DOM things>
with another? how do you arrange to advance validUntil while not having that
cause premature re-rendering?) but that's the basic overview.

> I understand the per-entity metadata server will refresh it sources
> regularly and often, but why should that force a refresh at the
> client? Can and will we use HTTP Conditional GET to eliminate (what
> seems like) unnecessary metadata refreshes?

Conditional GET is a slightly different consideration. That's actually fairly
simple to implement within the MetadataService because it can be defined in
terms of differences in a strong validator derived from the *rendered*
results (which already exists). The tricky part is to go over and above that
and avoid re-rendering when the *appropriate parts* of the source data have
not changed (even though the aggregates have), because re-rendering will
almost always end up with different results in practice because of things
like validUntil moving forwards.

Phew. I'm guessing you may have some follow-on questions...

-- Ian




Attachment: smime.p7s
Description: S/MIME cryptographic signature




Archive powered by MHonArc 2.6.16.

Top of Page