[RESOLVED] - SaaS-EU (Paris) GUI Access degradation on 2019/07/01 for TWA applications - #20190701A
 
  
    
   
   
   
  
  
   
   
    Incident Description   
    
     On 2019-07-01 - 04:35 AM (GMT), the SaaS-EU had an issue when accessing the GUI, using OSS API and using DX API.  
     
    
    
     Because it was a random issue without malfunction or failed stop of a server, there was no switch-over between sites.    
     
    
    
    
     - Incident Start Time: 2019-07-01 - 04:35 AM (GMT). 
 
- Restoration Time: 2019-07-01 - 05:06 AM (GMT).  
 
- Service(s) Impact Duration: 31 minutes.  
 
- Severe service impact: OSS (GUI and API), DX.  
 
- Non impacted services : Network Server (LRC).  
 
 
   
    
     Timeline (GMT)  
     
      - [ 2019-07-01 - 04:35 AM ] : Monitoring probes detected connection issue on GUI 
 
- [ 2019-07-01 - 04:36 AM ] : On-call engineer acknowledged the alarm. 
 
- [ 2019-07-01 - 04:36 AM ] : New alarms on DX API server (B-DXAPI-E2E Roundtrip). 
 
- [ 2019-07-01 - 04:40 AM ] : New alarms on DX API server (A-DXAPI-E2E Roundtrip).  
 
- [ 2019-07-01 - 04:51 AM ]   : New alarm on SMP server (SMP FD COUNT), Investigations point to SMP.   
 
- [ 2019-07-01 - 04:56 AM ] : DX servers were re-started. The issue remained. 
 
- [ 2019-07-01 - 05:01 AM ] : Restart of SMP (A then B).  
 
- [ 2019-07-01 - 05:06 AM ] : All alarms cleared.  
 
      
       We apologizes for not having post an article on status portal this time. 
       
      
      
       As part of our quality improvement we will update our on-call procedure to update 
       
status.thingpark.com prior our investigations.
       
      
 
     
    
   Root Cause Analysis 
   
    
     The main root cause is a load issue on A-SMP server. 
     
    
    
     We are currently investigating to find the cause of this overload by: 
     
    
    
     - analyzing logs we gather from both DX servers,
 
- analyzing traffic caught by our traffic monitoring probes,
 
- we suspect extra load from DX-API to SMP to be the root cause of this overload. 
 
 
   
   
    Actility Support