The network problems have reportedly been resolved.
Recent Updates Toggle Comment Threads | Keyboard Shortcuts
The network infrastructure on which EZID sits is experiencing performance issues this morning. The network team is investigating.
EZID will be taken offline at 9pm PDT tonight (Nov 1) for approximately 30 seconds for maintenance. We apologize for the inconvenience.
DataCite has been suffering some outages recently. Most of these outages have been brief (on the order of a few minutes), but a 30-minute outage occurred at 2pm today. DataCite is investigating the problem. As always, resolution of DOIs has not been affected.
About a dozen DOI requests failed last night due to a DataCite outage, but all is fine now. Here is the announcement from DataCite:
There was a MDS glitch today (2013-10-11) from 09:37 to 11:06 UTC. Most of
the incoming requests timed out (error 504). In particular no DOI
registration or update was possible.
MDS is now up and running again.
It looks like that during this time the connection to the database broke
down and MDS did not recover automatically but had to be restarted manually.
We’ll try to investigate a bit further to prevent this type of error in the
We apologize for any inconveniences.
Reminder: DataCite maintenance 4-5pm PDT. DOI creations and updates may not work during this period. All other services unaffacted.
N2T and EZID are back up. The outage was approximately 1hr. We apologize for the disruption.
Both EZID and the N2T resolver are down. The source of the problem is unknown at this time. We’re working on restoring the services as soon as possible.
The following has been provided by the DataCite Tech Team:
After everything has been tested for a long time we switched to the new machine on May 11th. The switch was absolutely smooth. There was no service disruption at all!
But four days later some connection timeouts for MDS occur and shortly afterwards MDS became completely unreachable. Unfortunately due to a configuration error our monitoring system did not noticed this. (This is of course fixed now!) This was the main reason for the long duration of the outage.
After noticing the problem the day after, we immediately switched back to the old machine. Everything was back to normal and we had time to investigate.
So what caused the outage? The connection between two key server components (Tomcat and Apache with proxy_ajp) broke down. The reason for this is unclear. Unfortunately we were also unable to reproduce the error no matter how hard we hit MDS. In this case it is obviously very hard to find a fix letting us feel confident enough for another try with this setup.
So after some discussion we decided to circumvent any potential roots of the problem. We migrated to a more modern and scalable web server (nginx). This took us a while, but the setup is now in place and we have already switched to it on Sunday. We are very confident that we now have a modern and reliable system.
However this switch was not as smooth as the one before. Two problems occurred:
1. We had to install a new SSL certificate due to expiring of the old one. However we missed to include the intermediate certificate. This might have broken your API clients. Due to browser caching this might have only affected a minority of UI users. This was fixed immediately after we got to know it on Tuesday.
2. We have also enabled HTTPS on schema.datacite.org and http://www.datacite.org. This causes a problem in MDS when reading the schema needed for validation. MDS was rejecting all metadata uploads. This is also fixed now.
Both problems are hard to detect at time of the switch or beforehand, because due to caching both did not occur immediately.
We are very sorry for all the inconvenience. We learned from the issues, e.g. improved our monitoring system. We are very confident that MDS is now stable again, and that all future server migrations will be smooth.
DataCite Tech Team
The outage affecting EZID service yesterday was part of a CDL-wide outage. In other words, all CDL services hosted in our organization’s data center were unavailable during this time (from 3:00 pm to 4:10 pm Pacific). The root cause is not known at this time. The data center staff are currently working on determine the cause.
We apologize for this inconvenience, and we will provide you with further information when it is available.