Loading…
Tuesday, October 27 • 2:00pm - 2:40pm
After One year of OpenStack Cloud Operation (NTT DOCOMO)

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

We had presented our cloud design at OpenStack Paris Summit (http://bit.ly/1DbJPUO) and started the operation after the conference. In this talk, we are going to share some important lessons and processes learned after the one year of OpenStack operation. This talk will help people who just want to start OpenStack operation or think of an operation by a small number of people.

1.   Team Building
It is essential to organize DevOps team to keep up with an active OpenStack development. We created DevOps team from scratch. We share the process of the team building and each member's skill doing DevOps.

2.   Monitoring System
Monitoring is important to keep the system stable. We share items we are currently monitoring (about 60,000 items) and show some important items to prevent service disruption. Alos, we share some custom scripts for OpenStack health check (e.g. RabbitMQ, MySQL and OpenStack services).

3.   Log Analytics
Logs (e.g. OpenStack debug log, Syslog, Auth log and Operation log) give you very important information and we can find potential problems/risks by analyzing those logs. We are getting more than 40GB logs a day and it is difficult to find important information among them. We demonstrate our Elasticsearch based log analytics/visualization tool to sort out useful information.

4.   Continuous Integration
Once you start a cloud service, it is difficult to stop the service though there are many necessary updates. We have updated the environment more than 100 times without downtime. We demonstrate Neutron Agent update that is one of the most difficult part of current OpenStack. We also share CI/CD tools and own tools used for system validation after updates.

5.   Daily Operation
We share our daily works.

  1. Tools help you to monitor the system efficiently 

  2. Tools help you to check security alert 

  3. Issue tracking and management 

  4. Tools and procedures used for emergency operation (remote operation tools)

Thanks to the community, it becomes easier to deploy OpenStack by many tools(e.g. Juju, RDO and Fuel); however, there is still less information about keep running/updating OpenStack without downtime. We are going to share our experiences and own tools developed through the private cloud operation. Also, we share future challenges to make OpenStack operation more efficient. Today, there are still some manual operations but our goal is to help OpenStack operators sleep better by automating most of the operations.

Speakers
avatar for Ken Igarashi

Ken Igarashi

Sr. Research Engineer, NTT DOCOMO, NTT docomo
Ken Igarashi is one of the first members of proposing OpenStack Bare Metal Provisioning (currently called "Ironic"). He is leading OpenStack based private cloud team as a developer and operator.
avatar for Asako Ishigaki

Asako Ishigaki

Engineer, NTT Software
Asako Ishigaki was engaged in a public cloud development for 2 years. She is currently operating OpenStack based private cloud and developing log collection and analytics tools used for the cloud operation.
avatar for Akihiro Motoki

Akihiro Motoki

Principal Software Engineer, NEC
Akihiro is working with OpenStack community from Folsom release in 2012 and is a core developer of several projects including Neutron (network), Horizon (dashboard) and OpenStackCLI. His main focus is to improve the usability of OpenStack and he is working on user-facing areas like... Read More →


Tuesday October 27, 2015 2:00pm - 2:40pm JST
Wakaba

Attendees (1)