We are looking for a highly experienced Subject Matter Expert (SME) will support company's monitoring and alerting program. Responsibilities include:
Planning, coordination and execution of Monitoring and Alerting for company's internal and external facing infrastructure.
To develop a monitoring and alerting framework based on Operational, support and customers' requirements.
Be able to learn, identify and understand how infrastructure works, what dependencies are and provide visibility through monitoring by collecting metrics, events, anomaly detections, errors based data and create severity based alerts.
Able to understand critical, high, medium and low severity components related to overall health related to systems and network, and be able to create and execute monitoring and alerting roadmap.
Be familiar with event management, monitoring and alerting systems.
Expert in python or other programing language, and be able to automate creation of monitoring, alerting and auto-remediation as needed.
Be able to assess current monitoring and alerting systems and identify current gaps, able to provide strategy on how to fix them.
Be able to work in fast paced environment and work with multiple departments and individuals.
Able to train NOC Team as required and ensure that Monitoring and alerting standards, policy and procedures are being followed.
What you will bring:
A minimum 2 years of work experience in developing and creating monitors and alerts.
Minimum 2 years of work experience working with Elastic Search, Kibana, Elastic Watcher, MapR, CatchPoint and be able to learn new technologies as required.
Excellent communication and writing skills.
Ability to present Strategies, roadmaps and weekly update(s) to Management and stakeholders.
Expertise in one or more programing skills (Python, SQL, C, C++, C#, Perl, PHP...)