Vision | - Core Belief: I believe that reliability is the most fundamental feature of any product.
- My Mission: My mission is to contribute to the development of distributed systems, with a strong focus on reliability.
- AI/LLM Focus: I am deeply interested in AI/LLM technologies and aim to use them to improve the developer experience.
|
Career Goal | - OSS Ambition: My career goal is to relocate to the US, become deeply involved in the OSS community, and contribute to the creation of new technologies.
- Leadership Goal: I aspire to lead a team in a senior position and take on greater responsibility.
|
Introduction | - Current Role: I am a Software Engineer with over 7 years of experience in site reliability and backend development.
- Key Experience: I have designed and developed distributed systems and enterprise applications, focusing on high performance and scalability.
- Core Skills: My professional skills include Go development and Kubernetes cluster management.
- Current Company: I am currently working as a software engineer at Cybereason, a US-based cybersecurity company operating in 40 countries.
- Primary Responsibilities: I develop new features and refactor legacy microservices for a core system. This system operates on over 12,000 servers and processes more than 80 million events per second.
- Side Projects: I contribute to open-source projects, including Kubernetes, Etcd, Raft, and ETL tools.
- Current Studies: I am a part-time student in the Master of Science in Computer Science program at the University of Colorado Boulder, studying online from Japan.
|
Past Job | - University Degree: I earned a Bachelor of Engineering from the University of Tokyo.
- During my studies, I worked with patient data, developed Python models, and authored a paper aimed at improving medical decision-making.
- First Role: I joined Gunosy, one of Japan's largest news app companies with 40 million users.
- I worked as a data engineer for 1.5 years, analyzing large-scale user data and improving advertising logic.
- Second Role (CTO): I co-founded Industry Technology, a company that provides enterprise applications to major real estate companies in Japan.
- In my role as CTO, I led the development team and personally wrote over 100,000 lines of backend code over 5.5 years.
- Side Engagements: I also worked as a freelance software engineer for several AI startups.
|
Why searching a new position | - Current Status: I am satisfied with my current work environment and am not actively seeking a new position.
- Growth Aspiration: I have a strong desire for growth, which I hope to achieve by working on large-scale products and taking on more responsibilities, such as a senior role.
- Future Plans: I am planning to relocate to the US in the near future and am therefore looking for US-related positions that offer visa support.
|
Why applied | - Alignment with Vision: This role aligns well with my personal vision and career goals.
- Relevant Experience: The job requirements match my experience.
- I gained experience in backend development and leadership at Industry Technology.
- I learned site reliability engineering and teamwork skills at Cybereason.
|
Why SRE → SWE | - Organizational Change: My team recently underwent a significant organizational change, which included reassignments.
- Shift in Focus: Previously, as an SRE, my main focus was incident response. Now, as a software engineer, I primarily work on feature development and refactoring.
|
Why CU Boulder | - Learning Motivation: I developed a strong desire to relearn computer science fundamentals, with a particular interest in AI.
- Reason for Choosing CU Boulder: CU Boulder offers a flexible program with many modern AI courses, which suited my learning goals.
|
Why Cybereason | - Challenging Environment: I was attracted to Cybereason because its system handles 80 million events per second and operates on over 12,000 servers, demanding high reliability.
|
Why left CTO | - Reason for Leaving: We sold the business.
- New Career Path: After the sale, I decided to pursue a career as an individual contributor rather than a business leader.
|
Why SRE | - Successful SRE Implementation: I have successful experience implementing SRE practices at Industry Technology.
- I introduced SRE practices there.
- This led to securing a contract with an enterprise customer.
- Key Learning: I realized the importance of deepening my understanding and application of SRE practices as a software engineer.
|
Why Backend | - Alignment with Goals: Working in backend development aligns with my vision and career goals.
- My Ideal Role: My ideal role is a Backend Engineer who can effectively utilize SRE practices.
|
Why Freelance | - Project Alignment: The freelance projects I undertook aligned with my vision and career goals.
- Nature of Projects: I was involved in projects related to Open Source Software (OSS), AI/LLM, and systems with high-reliability requirements.
|
Why US | - Career Alignment: Working in the US aligns with my vision and career goals.
- OSS Community: The US has large and active Open Source Software communities.
- Technical Challenges: There are opportunities to work on systems with considerable traffic and high-reliability requirements.
|
Career Path | - Preferred Path: I prefer an Individual Contributor (IC) career path.
- Ideal Future Position: My ideal future position is a Senior Backend Engineer who can apply SRE practices.
- Value of Soft Skills: I understand the importance of soft skills, especially in leadership roles.
- I enjoyed teamwork at Cybereason.
- I also enjoyed my leadership responsibilities at Industry Technology.
|
Visa | - PEO/EOR: Using a PEO/EOR (Professional Employer Organization / Employer of Record) arrangement is acceptable to me.
- Relocation Costs: I am proactive about relocation and prepared to cover visa-related expenses myself.
|
Interesting | - Enjoyable Work: I particularly enjoy developing observability features, such as custom metrics and distributed tracing.
- Relevant Project (CR1): At Cybereason, I designed and developed custom metrics, which improved observability across 3,000 servers.
|
Strength / Positive Feedback | - My Key Strength: My key strength is the ability to develop distributed systems and enterprise applications with a strong focus on reliability.
- Supporting Project (IT1): At Industry Technology, I designed SLI/SLO (Service Level Indicators/Objectives) and developed custom metrics and tracing to enhance system reliability.
|
Weakness / Negative Feedback | - Area for Development: I sometimes tend to prioritize quality over speed.
- Specific Example (New): At Industry Technology, I occasionally focused too much on details during code reviews, which sometimes caused delays in merging pull requests.
- Impact: My detailed reviews, while thorough, sometimes took longer than necessary, slowing down the overall review process.
- Corrective Actions: To address this, I worked on enhancing our CI processes and introduced AI-powered review tools.
- Outcome: These actions helped improve PR merge speed while maintaining code quality.
- Lesson Learned: I learned the importance of finding the right balance between quality and agility.
|
Failure | - Situation (New): At Industry Technology, I introduced an E2E (End-to-End) test automation tool for the frontend. My primary goal was to improve development efficiency by reducing the time frontend engineers spent writing tests. However, this unintentionally created communication silos between the frontend engineers and the QA team, hindering collaboration.
- Remedial Actions: We discontinued the specific E2E test automation tool, and I reassigned E2E testing responsibilities directly to the frontend engineers, empowering them to own the quality of their code from development through testing.
- Positive Outcome: This change eliminated the organizational silos and fostered better collaboration, ultimately leading to improved product quality.
- Key Takeaways: I learned that technical decisions, especially about tools, must consider the broader team impact beyond isolated benefits.
|
Leadership / Business Impact | - Project Summary (IT1): At Industry Technology, I designed SLI/SLO and developed custom metrics and tracing, which significantly enhanced system reliability.
- The Situation: We had a business opportunity to introduce our system to Japan's largest real estate company, but I realized our existing operations didn't meet their required service level.
- My Objective: My objective was to improve system reliability to meet the expected operational level for this client.
- My Actions: I made the decision to temporarily sacrifice some development agility to lead the introduction of SRE practices.
- Key Business Impact: As a result, we successfully secured the contract with the enterprise customer.
|
Ownership | - Project Summary (EN1): At enechain, I developed a generator tool for creating manifests and Terraform configurations for new microservices.
- Identified Problem: Developers were manually creating manifests and Terraform configurations for each new microservice, which was inefficient and prone to errors.
- Goal: My goal was to automate the generation of these configurations to improve development efficiency.
- Solution: I developed a generator tool to automate this process.
- Positive Impact: This tool significantly reduced the initial deployment time for new microservices across all teams.
|
Teamwork | - Incident Summary (CR2): At Cybereason, I analyzed thread dumps and heap dumps to troubleshoot an issue on a server that was handling 2,000 concurrent threads.
- The Issue: A UI server was crashing occasionally for a large enterprise customer (100,000 employees). Resolving this incident was challenging due to the need for coordination across multiple teams (SRE, DevOps, Product, TAM).
- My Objective: My objective was to identify the performance bottleneck and ensure system stability.
- My Contribution: As the incident commander, I coordinated the efforts of different teams. I personally identified the root cause by analyzing thread dumps and heap dumps.
- Resolution: We successfully resolved the availability issue.
|
Technically Challenging | - Project Summary (EU2): At Eukarya, I developed a real-time logging system for our workflow engine, which efficiently leveraged storage and caching.
- The Core Challenge: Users required a way to monitor the execution status of the workflow engine in real time, but the system lacked an effective logging and retrieval mechanism.
- My Objective: My task was to develop a real-time logging system that allowed users to check workflow execution status via the UI.
- My Technical Approach: I developed this feature using a Pub/Sub architecture with subscribers and publishers.
- Technical Hurdle: A key challenge was the high-performance requirement, as users needed to monitor logs in real time.
- How I Overcame It: I overcame this by utilizing an optimized data storage strategy, using Redis for cache and Google Cloud Storage (GCS) for long-term log storage.
- Key Technical Achievement: This enabled users to monitor worker status in real-time.
|
Conflict | - Situation Summary (New): At Cybereason, I was part of a project that successfully migrated 5,000 servers from Google Cloud to Oracle Cloud.
- The Proposal: Management proposed migrating a large number of Google Cloud servers to Oracle Cloud.
- My Initial Stance: I initially opposed this migration due to concerns about maintainability and attempted to persuade stakeholders with supporting data.
- The Decision: However, the final decision to migrate was made based on cost considerations.
- My Commitment: Once the decision was made, I fully committed to the project and played a role in successfully migrating the 5,000 servers.
- Positive Result: The migration resulted in a significant reduction in infrastructure costs.
|
Low Performer | - Mentorship Example (New): At Industry Technology, I mentored and trained a backend engineer who was initially underperforming.
- The Context: There was a backend engineer on my team who had previously specialized in frontend development and was struggling with backend tasks. He aspired to transition to backend engineering as a long-term career goal.
- My Approach: I supported his growth by conducting regular 1-on-1 sessions and creating structured learning documentation, all while ensuring our feature development schedule remained on track.
- Positive Outcomes: He eventually became a lead backend engineer at another company. Additionally, the structured documentation we created improved the onboarding process for all junior engineers on my team.
|
Refactor/Rearchitecture | - Project Summary (IT4): At Industry Technology, I led the migration from a modular monolith to a microservices architecture to improve system scalability.
- The Issue: The system was initially a modular monolith. As traffic to core features grew, the database write load increased, causing performance issues.
- Objective: My objective was to create independent microservices from the primary service to alleviate the increasing load.
- Execution: I executed the migration to a microservices architecture, ensuring minimized downtime during the process.
- Benefits: This migration simplified the overall architecture and significantly improved database performance.
|
Innovative Idea | - Project Summary (GU1): At Gunosy, I conceived and executed a novel A/B testing strategy that personalized initial user content based on ad engagement, significantly improving user retention.
- The Core Problem: User retention rates were declining, and my goal was to develop an innovative strategy to reverse this trend and improve user engagement from their very first experience.
- My Innovative Approach: I theorized that tailoring the initial content to a new user's prior ad interactions would be more engaging. Therefore, I designed and implemented A/B tests where new users received customized content based on the specific ads they had clicked before joining.
- Key Outcome: This personalized A/B testing approach significantly outperformed standard methods, leading to a notable increase in user retention rates.
|
Quick Learn | - Project Summary (EU1): At Eukarya, I quickly learned about WebAssembly and developed a processor that compiles user scripts (Python, Go, Rust, etc.) into WebAssembly (WASM) files and executes them efficiently on a WASM runtime.
- The Learning Need: Users needed the ability to use their own scripts (in languages like Python, Go, Rust) to create custom functions within our workflow engine, requiring an extension to the engine for custom script execution.
- My Goal: My task was to extend the workflow engine to support user-defined scripts through WebAssembly execution.
- My Learning & Action: I researched the technology, developed the feature, and created the necessary documentation.
- Key Outcome: This provided users with valuable new functionality and also helped build a knowledge base about WASM within the team.
|
Legacy | - Project Summary (IT5): At Industry Technology, I managed the migration of our database from MySQL 5.7 to MySQL 8.0 to ensure continued reliability and maintainability.
- The Situation: We needed to migrate from MySQL 5.7 to MySQL 8.0. An unexpected incident occurred during this migration process.
- My Actions & Resolution: I successfully resolved the incident and ensured the completion of the migration.
- Key Achievement: I led the project to a successful conclusion and, in doing so, helped establish a valuable knowledge base about incident response for the team.
|
Difficult Decision | - Project Context (EN2): At enechain, we adopted Telepresence via its Helm chart to accelerate microservice development. It connects local services to our Kubernetes cluster, significantly speeding up the develop-test-debug cycle.
- The Conflict: While Telepresence delivered the expected agility boost, its standard Helm chart failed to meet our stringent Kubernetes security requirements. This created a direct conflict: I had to decide between leveraging this newfound development speed and ensuring full security compliance.
- My Solution: To resolve this conflict, I customized the Telepresence Helm chart. I adapted it to be managed with Kustomize, which then allowed me to apply the necessary security patches. This approach ensured we met our security standards while still gaining the development benefits Telepresence offered.
- The Result: This tailored approach enabled us to achieve the faster development cycles offered by Telepresence while simultaneously upholding our crucial security standards, effectively balancing speed with compliance.
|
Tight Schedule | - Project Summary (IT2): At Industry Technology, I developed a group chat and notification system for exchanging media and texts under a demanding schedule.
- The Core Challenge: The primary challenge was an extremely tight deadline for delivering a feature-rich group chat system. This schedule didn't realistically allow for implementing a complex real-time solution like WebSockets.
- My Pragmatic Solution: To ensure on-time delivery, I opted for a simpler, polling-based approach for near real-time updates instead of WebSockets. While less efficient, polling was significantly faster to implement and met the immediate functional requirements.
- Key Result: Despite the tight schedule, we successfully delivered the group chat and notification system on time, meeting all critical requirements.
|
My Question | - Agility vs Reliability: The trade-off between agility and reliability is challenging. How does your team make decisions regarding this?
- Diversity: I am Asian. Does your team have diversity and cultural understanding?
- Career Path: If I continue to achieve outstanding results, what career path options would be available? e.g., Visa support and relocation
- Expectation: What are the expectations for the ideal candidate for this position?
- Legacy: What bottlenecks does the legacy part of the system have?
- Business Impact: How important is the project that this team is working on for the company?
- LLM: I have been involved in projects related to LLMs and am proactive about leveraging them. How does your team utilize LLMs?
- Interview Process: Please tell me about the remaining interview process and what I should prepare for.
- Learning: Are many people learning and trying out new technologies?
- Number: Can you provide some numbers to help understand the scale of the system I am responsible for?
- Decision: Could you tell me about a significant decision you have made recently?
- Background: Could you tell me the number of team members and their backgrounds?
- Project: Could you tell me about the project you\'re currently working on?
- Company Value:
- Ownership: Would it be possible for me to work on improvement tasks for other teams?
- Think Big: Are there opportunities to make bold decisions?
- Earn Trust: How are decisions made within the team when conflicts arise?
|