mysql故障转移

发布时间: 2023-11-21 12:48 阅读: 文章来源:1MUMB4210PS

作者:杨文

DBA,负责客户项目的需求与维护,会点数据库,不限于MySQL、Redis、Cassandra、GreenPlum、ClickHouse、Elastic、TDSQL等等。

本文来源:原创投稿

* 爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源。

一、前情提要:

我们知道 cassandra 具有分区容错性和强一致性,但是当数据所在主机发生故障时,该主机对应的数据副本该何去何从呢?是否跟宿主机一样变得不可用呢?想知道答案的话,就跟我一起往下看吧。

二、实验环境:

集群模式下跨数据中心:

数据中心 节点IP 种子节点 DC1 10.186.60.61、10.186.60.7、10.186.60.118、10.186.60.67 10.186.60.61、10.186.60.7 DC2 10.186.60.53、10.186.60.65、10.186.60.94、10.186.60.68 10.186.60.53、10.186.60.65

首先一起来瞅一瞅节点加入集群过程中的 owns 变化:

[cassandra@data01 ~]$ nodetool statusDatacenter: dc1=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensOwns (effective)Host IDRackUN10.186.60.788.29 KiB1646.0% 4702178e-9878-48dc-97e7-9211b7c9f2e7rack1UN10.186.60.11869.07 KiB1637.7% c920c611-2e8b-472d-93a4-34f1abd5b207rack1UN10.186.60.6188.25 KiB1634.2% af2e0c42-3a94-4647-9716-c484b690899irack1Datacenter: dc2=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensOwns (effective)Host IDRackUN10.186.60.6569.04 KiB1641.4% 89683bf8-aff8-4fdc-9525-c14764cf2d4frack2UN10.186.60.5383.18 KiB1641.7% 7c91c707-abac-44f2-811O-b18f03f03d13rack2[cassandra@data01 ~]$ nodetool statusDatacenter: dc1=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensOwns (effective)Host IDRackUN10.186.60.6774.01 KiB1624.7% 9d6d759b-c00c-488b-938d-3e1ef9b92b02rack1UN10.186.60.788.29 KiB1627.5% 4702178e-9878-48dc-97e7-9211b7c9f2e7rack1UN10.186.60.11883.16 KiB1628.9% c920c611-2e8b-472d-93a4-34f1abd5b207rack1UN10.186.60.6188.25 KiB1630.3% af2e0c42-3a94-4647-9716-c484b690899irack1Datacenter: dc2=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensOwns (effective)Host IDRackUN10.186.60.6583.17 KiB1627.7% 89683bf8-aff8-4fdc-9525-c14764cf2d4frack2UN10.186.60.5383.18 KiB1629.8% 7c91c707-abac-44f2-811O-b18f03f03d13rack2UN10.186.60.9469.05 KiB1631.1% c8fa86e4-ee9a-4c62-b00b-d15edc967b9frack2[cassandra@data01 ~]$ nodetool statusDatacenter: dc1=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensOwns (effective)Host IDRackUN10.186.60.6774.01 KiB1621.4% 9d6d759b-c00c-488b-938d-3e1ef9b92b02rack1UN10.186.60.788.29 KiB1625.2% 4702178e-9878-48dc-97e7-9211b7c9f2e7rack1UN10.186.60.11883.16 KiB1627.1% c920c611-2e8b-472d-93a4-34f1abd5b207rack1UN10.186.60.6183.19 KiB1628.9% af2e0c42-3a94-4647-9716-c484b690899irack1Datacenter: dc2=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensOwns (effective)Host IDRackUN10.186.60.6888.55 KiB1621.6% a7307228-62bb-4354-9853-990cac9614abrack2UN10.186.60.6583.17 KiB1624.0% 89683bf8-aff8-4fdc-9525-c14764cf2d4frack2UN10.186.60.5383.18 KiB1625.4% 7c91c707-abac-44f2-811O-b18f03f03d13rack2UN10.186.60.9469.05 KiB1626.4% c8fa86e4-ee9a-4c62-b00b-d15edc967b9frack2

可以看到,刚建立的集群,owns 的总和时刻保持在 200% ,但单个数据中心的 owns 不是 100% 。

三、具体实验:3.1、实验1:[cassandra@data01 ~]$ cqlsh 10.186.60.61 -u cassandra -p cassandraCREATE KEYSPACE "dcdatabase" WITH REPLICATION = {‘class‘:‘NetworkTopologyStrategy‘, ‘dc1‘ : 4, ‘dc2‘ : 4};use dcdatabase;create table test (id int, user_name varchar, primary key (id) );insert into test (id,name) VALUES (1,‘test1‘);insert into test (id,name) VALUES (2,‘test2‘);insert into test (id,name) VALUES (3,‘test3‘);insert into test (id,name) VALUES (4,‘test4‘);insert into test (id,name) VALUES (5,‘test5‘);

查看集群状态:

[cassandra@data01 ~]$ nodetool statusDatacenter: dc1=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensowns (effective)Host IDRackUN10.186.60.6796.55 KiB16100.0%9d6d759b-c00c-488b-938d-3e1ef9b92b02rack1UN10.186.60.788.29 KiB16100.0%4702178e-9878-48dc-97e7-9211b7c9f2e7rack1UN10.186.60.11888.33 KiB16100.0%c920c611-2e8b-472d-93a4-34f1abd5b207rack1UN10.186.60.6188.37 KiB16100.0%af2e0c42-3a94-4647-9716-c484b690899irack1Datacenter: dc2=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensowns (effective)Host IDRackUN10.186.60.6874.23 KiB16100.0%a7307228-62bb-4354-9853-990cac9614abrack2UN10.186.60.6583.17 KiB16100.0%89683bf8-aff8-4fdc-9525-c14764cf2d4frack2UN10.186.60.5388.36 KiB16100.0%7c91c707-abac-44f2-811O-b18f03f03d13rack2UN10.186.60.9474.23 KiB16100.0%c8fa86e4-ee9a-4c62-b00b-d15edc967b9frack2

可以看到集群中,每个数据中心的 owns 都是 400% ,符合四副本的设置;

查看数据在节点上的分布情况:

[cassandra@data01 ~]$ nodetool getendpoints dcdatabase test 110.186.60.710.186.60.9410.186.60.6510.186.60.11810.186.60.6710.186.60.6110.186.60.5310.186.60.68[cassandra@data03 ~]$ nodetool getendpoints dcdatabase test 510.186.60.6710.186.60.9410.186.60.710.186.60.5310.186.60.6510.186.60.11810.186.60.6110.186.60.68

可以看到集群数据分布在所有数据中心的所有节点上,符合数据的分布原理。

测试并查看集群中出现故障节点后的数据分布情况:

94机器关闭服务:systemctl stop cassandra[cassandra@data01 ~]$ nodetool statusDatacenter: dc1=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensowns (effective)Host IDRackUN10.186.60.6796.55 KiB16100.0%9d6d759b-c00c-488b-938d-3e1ef9b92b02rack1UN10.186.60.788.29 KiB16100.0%4702178e-9878-48dc-97e7-9211b7c9f2e7rack1UN10.186.60.11888.33 KiB16100.0%c920c611-2e8b-472d-93a4-34f1abd5b207rack1UN10.186.60.6188.37 KiB16100.0%af2e0c42-3a94-4647-9716-c484b690899irack1Datacenter: dc2=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensowns (effective)Host IDRackUN10.186.60.6874.23 KiB16100.0%a7307228-62bb-4354-9853-990cac9614abrack2UN10.186.60.6583.17 KiB16100.0%89683bf8-aff8-4fdc-9525-c14764cf2d4frack2UN10.186.60.5388.36 KiB16100.0%7c91c707-abac-44f2-811O-b18f03f03d13rack2DN10.186.60.9474.23 KiB16100.0%c8fa86e4-ee9a-4c62-b00b-d15edc967b9frack2

可以看到,94节点已经宕掉,但是 dc2 数据中心的 owns 分布并未改变。

查看数据分布在哪个节点:

[cassandra@data01 ~]$ nodetool getendpoints dcdatabase test 510.186.60.6710.186.60.9410.186.60.710.186.60.5310.186.60.6510.186.60.11810.186.60.6110.186.60.68

可以看到,数据仍分布在94节点上;

把故障节点94移除集群:

[cassandra@data02 ~]$ nodetool removenode c8fa86e4-ee9a-4c62-b00b-d15edc967b9f

查看集群状态:

[cassandra@data01 ~]$ nodetool statusDatacenter: dc1=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensowns (effective)Host IDRackUN10.186.60.6796.55 KiB16100.0%9d6d759b-c00c-488b-938d-3e1ef9b92b02rack1UN10.186.60.788.29 KiB16100.0%4702178e-9878-48dc-97e7-9211b7c9f2e7rack1UN10.186.60.11888.33 KiB16100.0%c920c611-2e8b-472d-93a4-34f1abd5b207rack1UN10.186.60.6188.37 KiB16100.0%af2e0c42-3a94-4647-9716-c484b690899irack1Datacenter: dc2=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensowns (effective)Host IDRackUN10.186.60.6874.23 KiB16100.0%a7307228-62bb-4354-9853-990cac9614abrack2UN10.186.60.6583.17 KiB16100.0%89683bf8-aff8-4fdc-9525-c14764cf2d4frack2UN10.186.60.5388.36 KiB16100.0%7c91c707-abac-44f2-811O-b18f03f03d13rack2[cassandra@data02 ~]$ nodetool getendpoints dcdatabase test 510.186.60.6710.186.60.710.186.60.5310.186.60.6510.186.60.11810.186.60.6110.186.60.68

可以看到,数据不在94节点上了;

说明:对于 cassandra 停止服务或移出集群,仍是可以使用的,只是不能登入自己的 cassandra 数据库,但仍可以登录其他 cassandra 数据库。

3.2、实验2:CREATE KEYSPACE "dcdatabase" WITH REPLICATION = {‘class‘:‘NetworkTopologyStrategy‘, ‘dc1‘ : 3, ‘dc2‘ : 3};

查看集群状态:

[cassandra@data01 ~]$ nodetool statusDatacenter: dc1=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensowns (effective)Host IDRackUN10.186.60.6796.55 KiB1673.2% 9d6d759b-c00c-488b-938d-3e1ef9b92b02rack1UN10.186.60.789.39 KiB1674.7% 4702178e-9878-48dc-97e7-9211b7c9f2e7rack1UN10.186.60.11888.33 KiB1677.4% c920c611-2e8b-472d-93a4-34f1abd5b207rack1UN10.186.60.6188.42 KiB1674.7% af2e0c42-3a94-4647-9716-c484b690899irack1Datacenter: dc2=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensowns (effective)Host IDRackUN10.186.60.6874.22 KiB16100.0%a7307228-62bb-4354-9853-990cac9614abrack2UN10.186.60.6584.14 KiB16100.0%89683bf8-aff8-4fdc-9525-c14764cf2d4frack2UN10.186.60.5388.30 KiB16100.0%7c91c707-abac-44f2-811O-b18f03f03d13rack2

可以看到集群中,每个数据中心的 owns 都是 300% ,符合三副本的设置;

测试并查看集群中出现故障节点后的数据分布情况:

94机器关闭服务,并移除集群:

[cassandra@data02 ~]$ nodetool removenode c8fa86e4-ee9a-4c62-b00b-d15edc967b9f

查看集群状态:

[cassandra@data01 ~]$ nodetool statusDatacenter: dc1=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensowns (effective)Host IDRackUN10.186.60.6796.55 KiB1673.2% 9d6d759b-c00c-488b-938d-3e1ef9b92b02rack1UN10.186.60.789.39 KiB1674.7% 4702178e-9878-48dc-97e7-9211b7c9f2e7rack1UN10.186.60.11888.33 KiB1677.4% c920c611-2e8b-472d-93a4-34f1abd5b207rack1UN10.186.60.6188.42 KiB1674.7% af2e0c42-3a94-4647-9716-c484b690899irack1Datacenter: dc2=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensowns (effective)Host IDRackUN10.186.60.6874.22 KiB16100.0%a7307228-62bb-4354-9853-990cac9614abrack2UN10.186.60.6584.14 KiB16100.0%89683bf8-aff8-4fdc-9525-c14764cf2d4frack2UN10.186.60.5388.30 KiB16100.0%7c91c707-abac-44f2-811O-b18f03f03d13rack2

此时,数据不在94节点上了,故障节点上的数据已移动到其他节点上,因此可以看到,在 dc1 数据中心中,数据随机仍只分布在其中三个节点上,而 dc2 数据中心的数据将分布在了仅有的三个节点上,发生了数据转移;

如果此时 dc2 数据中心还有节点继续故障,那么故障节点上的数据不可能再移动到其他节点上了,dc1 是不变的,owns 还是300% ,但是 dc2 的 owns都是100% ,没办法故障转移了,只能存在自身的数据了;

此时重启所有主机,所有主机 Cassandra 服务都会开启,包括之前故障模拟的节点也会自启,那么此时就会达到了另一种效果:故障模拟节点后的状态,再添加到了集群中,那么此时数据又会进行了自动的分发。

查看集群状态:

[cassandra@data01 ~]$ nodetool statusDatacenter: dc1=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensowns (effective)Host IDRackUN10.186.60.6796.55 KiB1673.2% 9d6d759b-c00c-488b-938d-3e1ef9b92b02rack1UN10.186.60.789.39 KiB1674.7% 4702178e-9878-48dc-97e7-9211b7c9f2e7rack1UN10.186.60.11888.33 KiB1677.4% c920c611-2e8b-472d-93a4-34f1abd5b207rack1UN10.186.60.6188.42 KiB1674.7% af2e0c42-3a94-4647-9716-c484b690899irack1Datacenter: dc2=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--AddressLoadTokensowns (effective)Host IDRackUN10.186.60.6874.22 KiB1673.2% a7307228-62bb-4354-9853-990cac9614abrack2UN10.186.60.6584.14 KiB1674.7% 89683bf8-aff8-4fdc-9525-c14764cf2d4frack2UN10.186.60.5388.30 KiB1674.7% 7c91c707-abac-44f2-811O-b18f03f03d13rack2UN10.186.60.9490.12 KiB1677.4% c8fa86e4-ee9a-4c62-b00b-d15edc967b9frack2
•••展开全文