2018-11-29

Distributed Fault Tolerant Cache System using GlusterFS & tmpfs

> to Japanese Pages

1. Summary

In this post I would like to introduce a distributed fault tolerant memory cache system using GlusterFS and tmpfs.

2. Introduction

In past postings, I introduced the use case of file system using GlusterFS as the theme of distributed fault tolerant system. In this post I would like to introduce a distributed fault tolerant memory cache system using GlusterFS and tmpfs. For the meaning of each keyword, please refer to the following. * Distributed Fault Tolerant Computer System * GlusterFS * tmpfs * Fault Tolerant * Cache Memory

3. Environment

* CentOS-7
* GlusterFS-4.1.5
* tmpfs

4. Architecture

5. Cache Servers Configuration

5-1. Install GlusterFS

# Both Cache Servers 1 and 2
$ sudo yum -y install centos-release-gluster
$ sudo yum -y install glusterfs-server

5-2. Startup GlusterFS

# Both Cache Servers 1 and 2
$ sudo systemctl start glusterd
$ sudo systemctl enable glusterd
$ sudo systemctl status glusterd

5-3. Set GlusterFS server hosts

# Both Cache Servers 1 and 2
$ sudo vim /etc/hosts
10.0.0.1 cache1.example.com
10.0.0.2 cache1.example.com

5-4. Create GlusterFS storage pool

# Only Cache Server 1
$ sudo gluster peer probe cache2.example.com

5-5. Confirm GlusterFS storage pool

# Both Cache Servers 1 and 2
$ sudo gluster peer status

5-6. Set tmpfs

# Both Cache Servers 1 and 2
$ sudo mkdir / /cache_server
$ sudo mount -t tmpfs -o size=512m tmpfs /cache_server
$ sudo vim /etc/fstab

5-7. Set to fstab for tmpfs

# Cache Server 1
$ sudo vim /etc/fstab
tmpfs    /cache_server    tmpfs    defaults,size=512m    0 0
# Cache Server 2
$ sudo vim /etc/fstab
tmpfs    /cache_server    tmpfs    defaults,size=512m    0 0

5-8. Create GlusterFS volume

# Only Cache Server 1
$ sudo gluster volume create server replica 2 cache1.example.com:/cache_server/ cache2.example.com:/cache_server/ force

5-9. Confirm GlusterFS volume information

# Both Cache Servers 1 and 2
$ sudo gluster volume info

5-10. Start GlusterFS volume

# Only Cache Server 1
$ sudo gluster volume start server

5-11. Confirm GlusterFS volume status

# Both Cache Servers 1 and 2
$ sudo gluster volume status

6. Cache Client Configuration

6-1. Install GlusterFS clients

# Both Web Servers 1 and 2
$ sudo yum -y install glusterfs glusterfs-fuse glusterfs-rdma

6-2. Set GlusterFS server hosts

# Both Web Servers 1 and 2
$ sudo vim /etc/hosts
10.0.0.1 cache1.example.com
10.0.0.2 cache2.example.com

6-3. Mount GlusterFS clients to GlusterFS servers

# Web Server 1
$ sudo mkdir /cache_client
$ sudo mount -t glusterfs cache1.example.com:/cache_server /cache_client
$ sudo df -Th
# Web Server 2
$ sudo mkdir /cache_client
$ sudo mount -t glusterfs cache2.example.com:/cache_server /cache_client
$ sudo df -Th

6-4. set fstab for GlusterFS auto mount

# Web Server 1
$ sudo vim /etc/fstab
cache1.example.com:/cache_server       /cache_client   glusterfs       defaults,_netdev        0 0
# Web Server 2
$ sudo vim /etc/fstab
cache2.example.com:/cache_server       /cache_client   glusterfs       defaults,_netdev        0 0

6-5. Test GlusterFS replication

# Web Server 1
$ sudo touch /cache_client/test.txt
$ sudo ls /cache_client
# Web Server 2
$ sudo ls /cache_client
$ sudo rm /cache_client/text.txt
# Web Server 1
$ sudo ls /cache_client

7. Benchmark Test

The results of the benchmark test are reference values. The following test program is written in golang.

7-1. Program Flow

1 MB Text
↓
# Cache System using GlusterFS and tmpfs
Repeat File Creating, Writing, Reading and Removing 1,000 Times
↓
# File System using GlusterFS and xfs
Repeat File Creating, Writing, Reading and Removing 1,000 Times
↓
Average Value of 10 Times Benchmark Test

7-2. Golang Program

# Web Server 1
package main

import (
 "fmt"
 "io/ioutil"
 "os"
 "time"
)

func main() {
 // Configure
 file_paths := []string {"/cache_client/test.txt", "/file_client/test.txt"}
 systems := []string {"Cache System", "File System"}
 results := []float64 {0, 0}
 benchmark_times := 10
 processing_times := 1000

 var content_string string
 for i := 0; i < 1000000; i++ {
  content_string += "a"
 }
 content_byte := []byte(content_string)

 for i := 0; i < benchmark_times; i++ {
  for j, _ := range file_paths {
   // Get processing start datetime
   start_datetime := time.Now()
   for k := 0; k < processing_times; k++ {
    // Write file
    err := ioutil.WriteFile(file_paths[j], content_byte, 0644)
    if err != nil {
     fmt.Println("File Writing Error: %s\n", err)
     os.Exit(1)
    }

    // Read file
    content_read, err := ioutil.ReadFile(file_paths[j])
    if err != nil {
     fmt.Println("File Reading Error: %s%s\n", err, content_read)
     os.Exit(1)
    }

    // Remove file
    err = os.Remove(file_paths[j])
    if err != nil {
     fmt.Println("File Removing Error: %s\n", err)
     os.Exit(1)
    }
   }
   // Get processing end datetime
   end_datetime := time.Now()

   // Get processing total time
   total_time := end_datetime.Sub(start_datetime)
   results[j] += total_time.Seconds()
   fmt.Printf("[%v] %v: %v\n", i, systems[j], total_time)
  }
 }

 for i, v := range results {
  average := v / benchmark_times
  fmt.Printf("%v Average: %vs\n", systems[i], average)
 }

 os.Exit(0)
}

7-3. Run Golang Program

# Web Server 1
$ go build main.go
$ ./main

7-4. Results

[0] Cache System: 16.180571409s
[0] File System: 16.302403193s
[1] Cache System: 15.93305082s
[1] File System: 16.61177919s
[2] Cache System: 16.311321483s
[2] File System: 16.393385347s
[3] Cache System: 16.036057793s
[3] File System: 16.740742882s
[4] Cache System: 16.139074157s
[4] File System: 16.754381782s
[5] Cache System: 16.151769414s
[5] File System: 16.90680323s
[6] Cache System: 16.340969528s
[6] File System: 16.693090068s
[7] Cache System: 16.177776325s
[7] File System: 16.961861504s
[8] Cache System: 16.226036092s
[8] File System: 16.638383153s
[9] Cache System: 16.622041061s
[9] File System: 16.887159942s
Cache System Average: 16.2618668082s
File System Average: 16.638999029100003s

8. Conclusion

In this way, the distributed fault tolerant cache system could be constructed with GlusterFS + tmpfs. Although the benchmark test is a reference value, we can see that the performance improves like this. The next theme of the distributed fault-tolerant system is “LizardFS” which is the same distributed fault-tolerant file system as GlusterFS. I thank Mark Mulrainey for LizardFS Inc. who gave me direct advice.

2018-11-10

Off-JT for Top Engineer's Way - #0. Introduction

> for Japanese Pages

1. abount “Off-JT for Top Engineer's Way”

“How can I become a top engineer?” I have opportunities to receive such questions such as IT lectures and seminars. (The definition of “top” is not mentioned here.) When I answer such questions to audiences, I have limited time to question and answer, so I will only tell the audience only the most symply things. However, there are various tricks for that in fact. For example, when I work on OJT as a trainer, I teach various things according to the situation. However, I can't do OJT with all the engineers seeking such answers. So I will make this article "Off-JT for Top Engineer's Way" series so that I can help you a little, I will tell you the tricks.

2. Target of “Off-JT for Top Engineer's Way”

* People aspiring to a engineer * Beginner and intermediate engineers * Growth suffering engineers * Engineers aspiring to a technical manager or a director * Engineers aspiring to CTO or CIO * etc...

3. The First Theme of “Off-JT for Top Engineer's Way”

The first theme of the “Off-JT for Top Engineer's Way” series is “# 1.Technical Memo” scheduled. * Off-JT for Top Engineer's Way #0. Introduction * Off-JT for Top Engineer's Way #1. Technical Memo