Cybereason Inc.
Company
| Key | Value |
|---|---|
| Company | Cybereason Inc. |
| Employee | 1000 |
| Founded | 2012 |
| Web Site | https://www.cybereason.com/ |
| Description | Cybereason is a global cybersecurity company operating in 40 countries. We provide EDR, XDR, and MDR solutions to stop cyberattacks. The core system runs on 12,000 servers and handles 80M events per second. |
| Location | San Diego, US |
Team (Software Engineer - Backend)
| Key | Value |
|---|---|
| Title | Software Engineer (Backend) |
| Mission | My mission was to develop new features and refactor legacy microservices for the core system. |
| Term | 2025-03 - 2025-08 |
| Team Size | 9 |
| Type | Permanent |
Team (Site Reliability Engineer)
| Key | Value |
|---|---|
| Title | Site Reliability Engineer |
| Mission | My mission was to work with a global team to improve system reliability for the core system. |
| Term | 2024-10 - 2025-03 |
| Team Size | 5 |
| Type | Permanent |
Projects
| Key | Value |
|---|---|
| Summary | CR1. I designed and developed custom metrics to improve observability across 3,000 servers, which cut problem resolution time by 50% for enterprise customers. |
| Situation |
|
| Task | My mission was to improve system observability, make debugging easier, and reduce problem resolution time. |
| Action |
|
| Result | As a result, better visibility and useful dashboards made troubleshooting faster. We cut problem resolution time by 50% for enterprise customers. |
| Challenge | A key challenge was making metrics that gave engineers useful insights, not just data points. |
| Solution | To solve this, I led the communication between SRE, DevOps, and Product teams. We worked together to define the most important metrics. |
| Learning | I learned that effective monitoring isn't about collecting data, but providing clear, actionable insights that help engineers solve problems faster. |
| Skill | Observability / Monitoring / Collaboration / Teamwork |
| Key | Value |
|---|---|
| Summary | CR2. I troubleshot and fixed a critical server issue with 2,000 concurrent threads, which resolved crashes affecting an enterprise company. |
| Situation |
|
| Task | My mission was to find what caused the crashes and fix it urgently. |
| Action |
|
| Result | As a result, the fix made the system reliable for the customer with 100,000 employees and we got their trust. |
| Challenge | A key challenge was that the pipeline restarted the server right after crashes automatically. This stopped me from collecting the dumps I needed. |
| Solution | To solve this, I modified the pipeline to collect dumps before restarting the server. |
| Learning | I learned that solving complex issues requires analyzing the system's core behavior, not just its surface-level symptoms. |
| Skill | Incident Response / Troubleshooting / Automation / Difficult Problem |
| Key | Value |
|---|---|
| Summary | CR3. I fixed a critical bug in the core API that required thorough testing across various versions and feature flags, which restored the correct Malop status for millions of endpoints. |
| Situation |
|
| Task | My mission was to identify the root cause of the bug and fix it without impacting millions of endpoints. |
| Action |
|
| Result | As a result, the bug was fixed successfully and deployed to production without any incidents, restoring correct Malop status for millions of endpoints. |
| Challenge | A key challenge was that the investigation took a long time because the bug only appeared under very specific conditions. |
| Solution | To solve this, I systematically tested different combinations and documented each test result, which eventually led me to identify the exact reproduction conditions. |
| Learning | I learned that reproducing complex bugs is both challenging and crucial - systematic testing and detailed documentation are essential for solving edge-case issues. |
| Skill | Difficult Problem / Troubleshooting / Automation |
Technology
| Value | Tag |
|---|---|
| Aerospike | Backend |
| Apache Kafka | Backend |
| Apache ZooKeeper | Backend |
| Consul | Backend |
| Elasticsearch | Backend |
| GraphQL | Backend |
| gRPC | Backend |
| Java | Backend |
| MongoDB | Backend |
| PostgreSQL | Backend |
| Python | Backend |
| Redis | Backend |
| Spring Boot | Backend |
| AWS | Infrastructure |
| Google Cloud | Infrastructure |
| Jenkins | Infrastructure |
| Kubernetes | Infrastructure |
| Oracle Cloud | Infrastructure |
| Terraform | Infrastructure |
| Elastic Stack | Monitoring |
| Grafana | Monitoring |
| Jaeger | Monitoring |
| Prometheus | Monitoring |