Cybereason Inc.
Company
Key | Value |
---|---|
Company | Cybereason Inc. |
Employee | 1000 |
Founded | 2012 |
Web Site | https://www.cybereason.com/ |
Description | Cybereason is a global cybersecurity company operating in 40 countries. We provide EDR, XDR, and MDR solutions to stop cyberattacks. The core system runs on 12,000 servers and handles 80M events per second. |
Location | San Diego, US |
Team (Software Engineer - Backend)
Key | Value |
---|---|
Title | Software Engineer (Backend) |
Mission | My mission was to develop new features and refactor legacy microservices for the core system. |
Term | 2025-03 - 2025-08 |
Team Size | 9 |
Type | Permanent |
Team (Site Reliability Engineer)
Key | Value |
---|---|
Title | Site Reliability Engineer |
Mission | My mission was to work with a global team to improve system reliability for the core system. |
Term | 2024-10 - 2025-03 |
Team Size | 5 |
Type | Permanent |
Projects
Key | Value |
---|---|
Summary | CR1. I designed and developed custom metrics to improve observability across 3,000 servers, which cut problem resolution time by 50% for enterprise customers. |
Situation |
|
Task | My mission was to improve system observability, make debugging easier, and reduce problem resolution time. |
Action |
|
Result | As a result, better visibility and useful dashboards made troubleshooting faster. We cut problem resolution time by 50% for enterprise customers. |
Challenge | A key challenge was making metrics that gave engineers useful insights, not just data points. |
Solution | To solve this, I led the communication between SRE, DevOps, and Product teams. We worked together to define the most important metrics. |
Learning | I learned that effective monitoring isn't about collecting data, but providing clear, actionable insights that help engineers solve problems faster. |
Skill | Observability / Monitoring / Collaboration / Teamwork |
Key | Value |
---|---|
Summary | CR2. I troubleshot and fixed a critical server issue with 2,000 concurrent threads, which resolved crashes affecting an enterprise company. |
Situation |
|
Task | My mission was to find what caused the crashes and fix it urgently. |
Action |
|
Result | As a result, the fix made the system reliable for the customer with 100,000 employees and we got their trust. |
Challenge | A key challenge was that the pipeline restarted the server right after crashes automatically. This stopped me from collecting the dumps I needed. |
Solution | To solve this, I modified the pipeline to collect dumps before restarting the server. |
Learning | I learned that solving complex issues requires analyzing the system's core behavior, not just its surface-level symptoms. |
Skill | Incident Response / Troubleshooting / Automation / Difficult Problem |
Key | Value |
---|---|
Summary | CR3. I fixed a critical bug in the core API that required thorough testing across various versions and feature flags, which restored the correct Malop status for millions of endpoints. |
Situation |
|
Task | My mission was to identify the root cause of the bug and fix it without impacting millions of endpoints. |
Action |
|
Result | As a result, the bug was fixed successfully and deployed to production without any incidents, restoring correct Malop status for millions of endpoints. |
Challenge | A key challenge was that the investigation took a long time because the bug only appeared under very specific conditions. |
Solution | To solve this, I systematically tested different combinations and documented each test result, which eventually led me to identify the exact reproduction conditions. |
Learning | I learned that reproducing complex bugs is both challenging and crucial - systematic testing and detailed documentation are essential for solving edge-case issues. |
Skill | Difficult Problem / Troubleshooting / Automation |
Technology
Value | Tag |
---|---|
Aerospike | Backend |
Apache Kafka | Backend |
Apache ZooKeeper | Backend |
Consul | Backend |
Elasticsearch | Backend |
GraphQL | Backend |
gRPC | Backend |
Java | Backend |
MongoDB | Backend |
PostgreSQL | Backend |
Python | Backend |
Redis | Backend |
Spring Boot | Backend |
AWS | Infrastructure |
Google Cloud | Infrastructure |
Jenkins | Infrastructure |
Kubernetes | Infrastructure |
Oracle Cloud | Infrastructure |
Terraform | Infrastructure |
Elastic Stack | Monitoring |
Grafana | Monitoring |
Jaeger | Monitoring |
Prometheus | Monitoring |