Skip to Content.
Sympa Menu

metadata-support - Re: [Metadata-Support] Extending Metadata Query Protocol

Subject: InCommon metadata support

List archive

Re: [Metadata-Support] Extending Metadata Query Protocol


Chronological Thread 
  • From: Jaime Perez Crespo <>
  • To: "" <>
  • Subject: Re: [Metadata-Support] Extending Metadata Query Protocol
  • Date: Mon, 23 Mar 2015 17:44:28 +0000
  • Accept-language: en-US, nb-NO

Hi again!

> On 18 Mar 2015, at 23:37 pm, Tom Scavo
> <>
> wrote:
> Thanks for joining us :-)

Thanks for your comments! :-)

> On Wed, Mar 18, 2015 at 9:25 AM, Jaime Perez Crespo
> <>
> wrote:
>>
>> * To be able to query the MDX server for a list of all the entities
>> served. When I say “list of entities”, I mean a list of identifiers used
>> by the MDX implementation that can be used to request the metadata of a
>> particular entity (i.e. the entityID or its SHA-1).
>
> An issue was recently filed against mdq-server that would satisfy your
> needs: https://github.com/iay/mdq-server/issues/27
>
> The use case we had in mind was to return enough information to be
> able to construct a list of links, basically entityID, DisplayName,
> and role.

Well, that sole use case is interesting for us, actually. But in any case, I
was thinking basically of two different, alternative flows:

- Go and fetch the list of entities served periodically. Prefetch new
entities in the list. For previously fetched entities, re-fetch by querying
the MDX server and using cache control headers, so that the server could
reply with a simple “Not modified”. This is extremely easy for the MDX,
allows it to serve static contents, but still keep caching reasonable.

- The other alternative would be fetching the entire metadata feed at
startup, then using that as the base on top of which to build an up-to-date
cache. Basically, fetch the feed, and periodically ask the MDX server for
updates.

>> I understand the first one could be easily disregarded by using the MDX as
>> a standard metadata feed, that is, fetching the whole metadata set it
>> serves, processing, caching, and then proceeding onwards by leveraging the
>> second one.
>
> That was my first thought as well. Before you brought it up, I've
> always thought an implementation would fetch the entire aggregate
> while booting up and then use the MDQ protocol to keep up to date.

That’s one of my ideas, yes.

>> * To be able to query the MDX server for a list of all the entities
>> *modified since* a specific date. This would allow us to query the server
>> later only for those entities that have been modified since the last
>> request.
>
> That's an interesting option, one we haven't discussed.
>
> In both cases, I guess a more basic question is: Are these
> implementation issues or protocol issues? The work we're doing on the
> first one, for instance, is still very much in the implementation
> phase (i.e., no one has yet proposed a protocol change).

In general, I would say I’d like to see these features regardless of the
implementation. Putting my SSP hat on, I’d like to have an implementation
that could take advantage of this features regardless of the underlying
implementation of MDX in use. If I wasn’t worried about that, the obvious
thing to do would be to do our own implementation that fits our needs and
avoid bothering you with my feature requests :-)

At this point, I think I should also put my Feide operator hat on, and give
some more background on the scenario I’m thinking about here. As you all
know, in Feide we operate a single-IdP federation. That means we run the hub,
which is also a SAML IdP that performs *all* authentications. Currently, we
are around half million logins per day, with peaks of around 80 complete
logins per second. Fortunately that’s nothing compared with what we’ve
tested, but we expect the usage of the IdP to keep growing fast.

Our infrastructure consists of 4 production servers and 2 for testing, all of
them load balanced. Sessions are shared by means of a memcache backend, also
load balanced. Given our current agreements, we don’t run the infrastructure
directly. That means we delegate the maintenance of the servers, and we
cannot manage them with one exception: minor config changes and metadata
updates. Metadata is managed by means of version control and cron jobs
running on every server every 5 minutes, fetching latest changes. Metadata
handling is done manually, by committing changes or new files to the
repository.

One of the things we would like to achieve is easier, unattended, even
self-service metadata management. We provide a customer portal for both our
institutions and service providers. Our goal is that service providers can
register their own metadata in the customer portal by themselves, and make
that work automatically without our intervention (except for approval when
going live into production, of course). With that in mind, either option that
I was suggesting before would be extremely helpful to achieve our goals for
metadata handling. MDX servers could provide a snapshot of the current status
of the federation at any point in time, and every physical server would just
need to sync with them. The cron jobs on each server could be fetching
metadata from MDX, instead of from a central version-control repository. But
what we do today (i.e. fetch only the changes in the overall status of
metadata) would be impossible to do with MDX as is today. With either option,
we could have a periodical cron job keeping up to date every local cache,
which could be in memory for extremely fast access. We would only have a gap
for new services within a span less or equal to the period when the cron jobs
are run. So if someone adds new metadata (let’s say to testing) through our
customer portal, and they try to login immediately (without giving the cron
job time to run), their new metadata won’t be in the local caches and
therefore the server they’re landing at would need to check with the MDX.
Worst case scenario would be at least one request per server before the cron
job runs. That’s unlikely itself, and it’s even more unlikely that at that
precise point in time the MDX got a hiccup and it takes more than reasonable
to give a response.

So what we would be achieving here would be both an improvement in
reliability, minimizing any possible “damage” noticed by end-users if
something goes wrong, and allowing extremely flexible and automated metadata
management that scales well even for an IdP that needs very high performance
capabilities.

> Btw, fetching the entire aggregate *is* specified in the protocol spec.

I’ve noticed, but I think it’s not implemented, isn't it?

--
Jaime Pérez
UNINETT / Feide
mail:

xmpp:


"Two roads diverged in a wood, and I, I took the one less traveled by, and
that has made all the difference."
- Robert Frost

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail




Archive powered by MHonArc 2.6.16.

Top of Page