Vision | - I believe reliability is the most important part of any product.
- My mission is to dramatically grow amazing products in a startup environment by balancing agility and reliability.
- I am also interested in LLMs and want to explore how they can create the next generation of development experiences.
|
Career Goal | - My career goal is to move to the US, join the OSS community deeply, and lead developing wonderful technologies and products in startup markets.
|
Introduction | - I am a Software Engineer with 7 years of experience specializing in reliability engineering.
- I have developed enterprise-grade distributed systems in startup environments, effectively balancing reliability and agility.
- I am skilled in Golang and Java development and Kubernetes cluster management.
- As side projects, I contribute to OSS like Kubernetes, Etcd, Raft, and a workflow engine.
- I also study Computer Science online at CU Boulder while working in Japan.
|
Past Job | - I studied Engineering at the University of Tokyo. I worked with patient data, developed Python models, and wrote a paper.
- As a first role, I joined Gunosy, Japan's largest news app with 40 million users. I worked as a data engineer for 1.5 years and improved advertisement performance using big data.
- As a second role, I co-founded Industry Technology for real estate companies. As CTO, I led the development team and wrote entire backend code in 5.5 years.
- As a third role, I joined Cybereason as a Software Engineer, helping to operate a distributed cybersecurity platform that processes 80 million events per second across 12,000 servers worldwide.
- In parallel with these roles, I have contributed to several AI/LLM-focused startups as side projects, designing and improving backend systems.
|
Why searching a new position | - I am satisfied with my current job and am not actively seeking.
- However, I have clear vision and career goals. I am always open to new opportunities that can help me grow and succeed.
- I am waiting for the right opportunity to move to the US and receive help with visa support.
|
Why applied | - This role fits my vision and career goals perfectly.
- My experience matches what you need.
- I learned backend developments and leadership at Industry Technology.
- I learned SRE practices and teamwork at Cybereason.
|
Why CU Boulder | - I wanted to study computer science basics again, especially AI.
- CU Boulder has flexible online classes with modern AI courses.
|
Why Cybereason | - I joined Cybereason for the reliability challenge - their system handles 80 million events per second on 12,000 servers.
|
Why left CTO | - We sold the business.
- After that, I chose to be an IC instead of a business leader.
|
Why SRE | - I successfully introduced SRE practices at Industry Technology and got an enterprise contract.
- I learned that SRE practices are very important skills for all software engineers.
|
Why Backend | - The responsibility fits my vision and career goals completely.
|
Why Freelance | - The projects matched my vision and career goals.
- I worked on OSS, AI/LLM, and high-reliability systems.
|
Why US | - The US has big OSS and startup communities.
- I can work on large-scale, high-reliability systems to drive the development of amazing products.
|
Career Path | - I prefer the IC path.
- My ideal role is a Senior Backend Engineer who uses SRE practices.
- I understand soft skills matter for leaders.
- I enjoyed teamwork at Cybereason.
- I enjoyed leadership at Industry Technology.
|
Visa | - I can work through PEO/EOR arrangements.
- I am prepared to relocate and will cover my own visa expenses.
|
Interesting | - I really enjoy developing observability features like custom metrics and distributed tracing.
- (CR1) At Cybereason, I built custom metrics that improved observability for 3,000 servers.
|
Strength / Positive Feedback | - My strength is my background in both SRE and backend development.
- On infrastructure teams, I'm the fastest programmer, and on backend teams, I'm the strongest system designer.
- (IT1) At Industry Technology, I designed SLIs/SLOs and developed custom metrics and tracing to make the system more reliable.
|
Weakness / Negative Feedback | - Weakness: I sometimes focus too much on quality instead of agility.
- Summary (New):At Industry Technology, I spent too much time on code reviews, which delayed PR merges.
- Situation: My reviews took too much time and slowed down the team because I focused heavily on quality and detail.
- Action: To solve this, I improved our CI pipelines and added AI review tools to help.
- Result: As a result, PRs were merged faster while maintaining high quality.
- Learning: I learned that thorough automation helps increase speed while maintaining quality.
|
Failure | - Summary (New): I introduced an E2E test automation tool at Industry Technology, which accidentally created silos between teams.
- Situation: I added an automation tool to help frontend engineers save time on writing tests. But it created walls between frontend and QA teams and hurt teamwork.
- Action: We stopped using the tool. I gave E2E testing to frontend engineers so they owned their code quality.
- Result: As a result, teams worked better together and product quality improved.
- Learning: I learned that tool choices must think about team impact, not just technical benefits.
|
Leadership / Business Impact / Difficult Decision / Innovative Idea | - Summary (IT1): I introduced SRE practices at Industry Technology, which helped win an enterprise contract.
- Situation: Japan's largest real estate company wanted our system, but our operations were not good enough for them.
- Task: My mission was to improve reliability to meet their needs and get their trust.
- Action: I paused new feature developments and implemented SRE practices. I created SLIs/SLOs and built metrics and distributed tracing. I also prepared monitorings and on-call schedules.
- Result: As a result, we won the enterprise contract.
- Learning: I learned that there are times even in a startup environment when reliability must be prioritized.
|
Ownership | - Summary (EN1): I developed a generator tool at enechain, which helped 50 developers deploy faster.
- Situation: All developers manually wrote terraform and kubernetes configs for each microservice, which was slow and error-prone.
- Task: My mission was to automate this to save time.
- Action: I developed a generator tool even though it wasn't my main mission.
- Result: As a result, new microservices were deployed much faster for all teams.
- Learning: I learned that taking ownership beyond my main tasks helps accelerate the entire company.
|
Teamwork | - Summary (CR2): I led cross-functional teams as an incident commander at Cybereason, which fixed a critical issue affecting an enterprise company.
- Situation: An API server kept crashing for a big customer. We needed many teams (SRE, DevOps, Product, TAM) to work together.
- Task: My mission was to find the problem and fix the system urgently.
- Action: As an incident commander, I organized all teams. We checked thread dumps and heap dumps to find the cause and implemented a fix.
- Result: As a result, we solved the ticket and got the customer's trust.
- Learning: I learned that fixing incidents needs both tech skills and good team coordination.
|
Technically Challenging | - Summary (EU2): I developed a real-time logging system for a workflow engine at Eukarya, which delivered millisecond-level status updates.
- Situation: Users needed to monitor workflow status right away after execution, but we had no good logging mechanism.
- Task: My mission was to build real-time logging system with cost efficient architecture.
- Action: I used Pub/Sub design with Redis cache and GCS for storage.
- Result: As a result, users saw worker status in real-time.
- Learning: I learned that I can balance performance and cost to build an efficient system.
|
Conflict | - Summary (New): I disagreed with an engineering manager about migrating 3,000 servers, which led to a better migration strategy after discussions.
- Situation: Management and engineering managers wanted to migrate some servers from Google Cloud to Oracle Cloud within 6 months to save infrastructure costs.
- Task: My mission was to share my technical concerns while supporting business goals.
- Action:
- First, I disagreed and presented specific data:
- I showed that multi-cloud complexity would increase ops burden by 50%
- I explained that our team had no Oracle Cloud expertise
- I calculated that API differences would require 3 months of code changes
- After that, I proposed a compromise solution:
- We should start with non-critical services as a pilot
- We could hire Oracle support engineers as consultants
- We would migrate core services only after pilot success with proper planning
- After discussions with an engineering manager, we agreed on a phased migration plan with Oracle consultants to guide us.
- Finally, once the decision was made, I fully committed to the plan and helped the migration.
- Result: As a result, we migrated successfully and saved significant infrastructure costs.
- Learning: I learned that effective conflict resolution requires "disagree and commit" - voice concerns with data, but once a decision is made, fully support it and work toward success.
|
Low Performer | - Summary (New): I mentored a struggling engineer at Industry Technology, which helped him become a lead backend engineer.
- Situation: A frontend engineer joined the backend team but struggled. He wanted to transition to a backend engineering career.
- Task: My mission was to help him grow without slowing our work.
- Action: I did regular 1-on-1s and made learning docs for him.
- Result: As a result, he became a lead backend engineer. Our docs also helped train all new engineers.
- Learning: I learned that helping people grow benefits everyone afterwards.
|
Refactor / Rearchitecture / Legacy | - Summary (IT4): I migrated from monolith to microservices at Industry Technology, which improved system scalability.
- Situation: Our monolith system had database problems when traffic grew.
- Task: My mission was to split it into microservices to reduce load.
- Action: I migrated to microservices with no downtime utilizing BFF layer.
- Result: As a result, the system became simpler and database performance improved a lot.
- Learning: I learned that migrating a system with no downtime requires several times more effort than a typical one.
|
Quick Learn | - Summary (EU1): I quickly learned WebAssembly at Eukarya, which enabled users to extend the workflow engine.
- Situation: Users wanted to use their own scripts (Python, Go, Rust) to extend the workflow engine.
- Task: My mission was to add WebAssembly support for user custom scripts.
- Action: I learned WebAssembly quickly, developed the feature, and wrote docs for the team.
- Result: As a result, users got an intended feature and the team learned WASM.
- Learning: I learned that individual learning can become a valuable asset for the team.
|
Tight Schedule | - Summary (IT2): I delivered group chat system at Industry Technology on a tight schedule, which increased user engagement.
- Situation: We had a tight deadline for a group chat and notification system.
- Task: My mission was to design and develop the system on time.
- Action: I used polling instead of WebSockets - inefficient but simpler design.
- Result: As a result, we shipped the feature on time and user engagement went up.
- Learning: I learned that good-enough on time beats perfect but late.
|
Missed Deadline | - Summary (CR3): I fixed an API bug at Cybereason after core engineers left, which prevented losing trust from an enterprise customer.
- Situation: We had a bug in a core API. But the original developers left with little documentation.
- Task: My mission was to fix this bug in one sprint.
- Action: When I realized it would be delayed, I sent regular progress reports to the customer and TAM.
- Result: As a result, we finished a week late but with good documentation and testing. The customer liked our honest updates.
- Learning: I learned that it's important to share regular updates with stakeholders to manage their expectations.
|
My Question | - Agility vs Reliability: How does your team balance speed and reliability?
- Diversity: I am Asian. Does your team value diversity?
- Career Path: If I do well, what career paths exist? Can you help with visa support and relocation?
- Expectation: What do you expect from the ideal candidate?
- Legacy: What problems does your legacy system have?
- Business Impact: How important is this team to the company?
- LLM: I've worked with LLMs. How does your team use them?
- Interview Process: What are the remaining steps? What should I prepare?
- Learning: Does the team learn new technologies?
- Number: What's the scale of your system?
- Decision: What important decision did your team make recently?
- Background: How big is the team? What are their backgrounds?
- Project: What is your team working on now?
- Restriction: What things slow down daily development productivity?
- Favorite: What is your favorite thing and least favorite thing?
- Company Value:
- Ownership: Can I help other teams improve?
- Think Big: Can I make bold decisions?
- Conflicts: Did you experience any conflicts within your team?
|