ECO platform outage

[resolved][2021-03-10] ECO platform is down: ThingPark Community Portal, POC, Dev1 total outage

2021-03-27T22:00:00Z

  • ๐Ÿ›  community.thingpark.org:
    • โœ” subscriber creation is working again, but still not available for customers as Interop Engine is not working yet,
    • โœ” Single Sign-On (SSO) to interop engine is also working
    • ๐Ÿ›  Interop engine validation needed.
  • โœ” TPXLE + ADA working for dev-ope instance
    • ๐Ÿ›  TPXLE not working for community instance without portal
  • โœ” All needed operator instances are now created.
  • ๐Ÿšฅ Next in line:
    • Subscribers, basestations and devices migration.
    • TPX decoders and drivers
    • Restore Partners sites
Please refer to the status page: https://status.thingpark.com/
---
Incident history:

Thingpark Community, PoCs and Dev1
โŒNetwork Server for PoCs Major incident
โŒNetwork Server for Dev1 Major incident
โŒOSS API and GUI Major incident
โŒThingpark X Location Engine Major incident
โŒThingPark X IoT-Flow Major incident
โŒDX API Major incident
โŒdocumentation platform Major incident

  1. 2021-03-10T09:35:00Z: the hosting company (OVH) just communicated they won't be able to restore their datacenter today.



2021-03-10T06:51:00Z (morning, Paris time) : ๐Ÿ”ฅ A datacenter hosting some of our services has burnt down ๐Ÿ”ฅ

2021-03-11T13:50:00Z
        after the fire ๐Ÿ”ฅ that damaged multiple datacentres in Strasbourg, France, yesterday, the entire ECO environment is still down.
        OVH is in disaster-recovery mode. In the next few days, they will let us know what services / data are gone and what we can restore. We will then assess the situation.
        Our teams are working on restoring our own manual backups to a different location.

2021-03-18, morning, Paris time. Disaster recovery plan is ongoing.
- โœ” The new community platform is up, including DX-API and IPSec.
- ๐Ÿ›  IoTFlow, ThingPark X Location Engine, Interoperability Engine are not back online, yet.
- ๐Ÿšฅ next in line for re-establishment:
  - TPX IoTFlow
  - ThingPark X Location Engine (TPXLE) + Abeeway Device Analyzer (ADA)
  - migration of operator instances, subscribers, basestations (gateways), devices
  - provisioning of Community offers (including Interop)

2021-03-19, morning, Paris time. Disaster recovery plan is ongoing.
- โœ” IoTFlow is installed
- โœ” DX-API was updated to facilitate reprovisioning of subscribers and end-user accounts
- ๐Ÿ›  ThingPark X Location Engine, Interoperability Engine are not back online, yet.
-๐Ÿšฅnext in line:
  - validate TPXLE and ADA
  - achieve creation of all operator instances
  - migrate subscribers, basestations (gateways), devices
  - provision Community and Partner portals offers (including Interop)
    • Related Articles

    • [RESOLVED] - OVH (our datacenter) outage on 2017/11/09 - #201701109A

      Since the 2017/11/09 07:43 AM (CET) our datacenter at OVH is unreachable and this impact big part of Actility services. Actility services are accessible from certain locations/networks. Issues are located on some global routers by OVH (out of our ...
    • [RESOLVED] SaaS - EU outage on 2019-05-22 - #20190522[A]

            Incident Description  On  2019-05-22 - 03:08 AM, our SaaS-EU datacenter became unreachable.  From the outside world, it looks like a complete outage. Incident information and Service impact Incident Start Time: 2019-05-22 - 03:08 AM. ...
    • [RESOLVED] - LRC outage for SaaS-EU on 2020-04-25

      On 2020-04-25, our LRC end-to-end probe detects a traffic issue. The alarm is automatically cleared within five minutes: the LRC process was restarted after catastrophic memory failure. Incident information and Timeline (UTC) [summary] The problem ...
    • [RESOLVED] SaaS - EU (Paris) outage on 2019/01/15 - #20190115B

      On 2019-01-15T01:25Z, following planned hardware maintenance and incident #20190115A, site B went offline. Actility technical teams are working on this issue. Current state Resolved. Incident information and Service impact Incident Start Time: ...
    • [RESOLVED] - LRC outage for SaaS- EU from 2018/04/12 - 2018/05

      Since April 12, 2018, our LRC probe detects from time to time a lack of traffic; packets coming from gateways are no longer forwarded to AS. Initial incidents were detected by the on-call engineer who gathers enough information to find the root ...