Position: IoT Platform Engineer – Site Reliability
Location: Sunnyvale CA US
Verizon is on a mission to significantly improve citizen’s life by enabling Smart City. Verizon’s product technology stack spans across hardware, modern software stack, API, public and private cloud and worldwide deployment. The offering spans across Smart Lighting to Smart Video – Traffic, Parking and much more. Being part of critical city infrastructure, keeping it up and running 24×7 is very critical and stakes are high; hence the complete solution needs exhaustive monitoring, security, cloud operations, 24×7 support to keep Smart City up and running. We are seeking an experienced lead who brings combination of both support and site reliability experience and most importantly, a “can-do” attitude, who is ready to take on new challenges and quickly come up to speed to drive critical initiatives.
Tech Stack: Cloud Services: AWS, EC2, S3, Docker, Mesos-DCOS OS: Linux (Ubuntu) Monitoring tools: Elastic Search, Nagios/Zabbix/Prometheus. Graphite, Grafana, APM tools –Synthetic API monitoring – APICA, PagerDuty, AppDynamics, NewRelic, Languages: Python and shell scripting Product Technology Stack: ZeroMQ, Node4JS, Cassandra, Neo4J, ELK, MQTT, Rabbit MQ etc., Erlang, Python, Clojure etc. Security: OpenVPN, Internet cloud security, SSL, HTTPS, TLS, DTLS etc. Database: Cassandra, Neo4J
Responsibilities In this role, you will be responsible for leading, managing and defining CloudOps Support, ensuring Production is always up and running with desired SLAs/KPIs You will build best practices and support model for keeping Smart City up and running 24×7. Manage support team both locally as well as offshore. Build support model involving API Developers/Partners, Support Partner, FieldOps and CloudOps team, Integrate with Support Partner tool set. Provide on-call support Debug/root cause analysis in a distributed system and build best Runbooks/automate it. Understand NetSense technology suite, determine what is needed for monitoring, alerting. Evaluate tool sets and participating in building new monitoring capabilities as well as bring requirements to DevOps/Product Management team. Build solid automated, monitoring and alerting platform. This includes but not limited to – Platform Monitoring, API Monitoring, Logs Monitoring, Build best practices around monitoring and define various metrics that can be measured including SLA. Extensively collaborate with Cloudops team, engineering, QA, field performance and support teams to root cause, debug production issues, learn from it and build Runbooks and SOPs. Work with CISO and build secure cloud offerings in production Database management in production – Back-up, recovery etc.
Must have: Bachelor’s degree or six or more years of work experience. Six or more years of relevant work experience.
Ideally, you’ll also have: Your degree in Computer Science. Eight or more years of related experience, ideally in a fast-paced, growing company Experience leading CloudOps support teams and developing best practices. Excellent working experience supporting applications in cloud environments – AWS, Mesos/DCOS, Docker etc. preferred. Proven ability to debug/root cause complex distributed system in production; build/automate Runbooks Database management in production – Back-up, recovery etc. Good programming skills in Python, Shell programming etc. Define and lead monitoring for IaaS, PaaS, SaaS (API), 24×7 from low level monitors to high level service level monitors (e.g. process/service monitor, log monitor, REST API monitor, storage monitor etc.) Experience working on tools like Nagios, Zabbix, ELK, Graphite, Grafana, APM tools (new relic, AppDynamics etc.) Experience integrating with ZenDesk, Pager Duty, JIRA etc. Experience building analytics from data collected from various monitors. e.g. TOP N APIs, TOP N customer etc. Knowledge to build capacity planning map from historical data collected from production system. Good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring and storage systems, load balancing. Experience in the Linux environment and a good understanding of its fundamentals and internals: filesystems, security, networking etc. Working knowledge of VPN, internet security, SSL, HTTPS, TLS, DTLS etc. A demonstrated passion for automation. Results-oriented, collaborative professional with ability to work successfully in a matrixed organization. Clear communicator who is very conductive to working in a team environment and helps lift team spirit. Grit, drive and a strong feeling of ownership.
Not to boast, but a little bit about us
Verizon powers America’s fastest and most reliable network. We’re also leading the way in cloud and security solutions, Internet of Things and video entertainment. Technology moves fast and so do we. We believe that bringing great ideas and customer experiences to life should be recognized and rewarded. Whether you think in code, words, pictures or numbers, find your future at Verizon.
Equal Employment Opportunity
We’re proud to be an equal opportunity employer – and celebrate our employees’ differences, regardless of race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, or Veteran status. Different makes us better.