One of the cStorVolumeReplica(CVR) will have its status as Invalid after corresponding pool pod gets recreated
When User delete a cStor pool pod, there are high chances for that corresponding pool-related CVR's can goes into Invalid state.
Following is a sample output of kubectl get cvr -n openebs
NAME USED ALLOCATED STATUS AGE
pvc-738f76c0-b553-11e9-858e-54e1ad4a9dd4-cstor-sparse-p8yp 6K 6K Invalid 6m
Troubleshooting
Sample logs of cstor-pool-mgmt when issue happens:
rm /usr/local/bin/zrepl
exec /usr/local/bin/cstor-pool-mgmt start
I0802 18:35:13.814623 6 common.go:205] CStorPool CRD found
I0802 18:35:13.822382 6 common.go:223] CStorVolumeReplica CRD found
I0802 18:35:13.824957 6 new_pool_controller.go:103] Setting up event handlers
I0802 18:35:13.827058 6 new_pool_controller.go:105] Setting up event handlers for CSP
I0802 18:35:13.829547 6 new_replica_controller.go:118] will set up informer event handlers for cvr
I0802 18:35:13.830341 6 new_backup_controller.go:104] Setting up event handlers for backup
I0802 18:35:13.837775 6 new_restore_controller.go:103] Setting up event handlers for restore
I0802 18:35:13.845333 6 run_pool_controller.go:38] Starting CStorPool controller
I0802 18:35:13.845388 6 run_pool_controller.go:41] Waiting for informer caches to sync
I0802 18:35:13.847407 6 run_pool_controller.go:38] Starting CStorPool controller
I0802 18:35:13.847458 6 run_pool_controller.go:41] Waiting for informer caches to sync
I0802 18:35:13.856572 6 new_pool_controller.go:124] cStorPool Added event : cstor-sparse-p8yp, 48d3b2ba-b553-11e9-858e-54e1ad4a9dd4
I0802 18:35:13.857226 6 event.go:221] Event(v1.ObjectReference{Kind:"CStorPool", Namespace:"", Name:"cstor-sparse-p8yp", UID:"48d3b2ba-b553-11e9-858e-54e1ad4a9dd4", APIVersion:"openebs.io/v1alpha1", ResourceVersion:"1998", FieldPath:""}): type: 'Normal' reason: 'Synced' Received Resource create event
I0802 18:35:13.867953 6 common.go:262] CStorPool found
I0802 18:35:13.868007 6 run_restore_controller.go:38] Starting CStorRestore controller
I0802 18:35:13.868019 6 run_restore_controller.go:41] Waiting for informer caches to sync
I0802 18:35:13.868022 6 run_replica_controller.go:39] Starting CStorVolumeReplica controller
I0802 18:35:13.868061 6 run_replica_controller.go:42] Waiting for informer caches to sync
I0802 18:35:13.868098 6 run_backup_controller.go:38] Starting CStorBackup controller
I0802 18:35:13.868117 6 run_backup_controller.go:41] Waiting for informer caches to sync
I0802 18:35:13.946730 6 run_pool_controller.go:45] Starting CStorPool workers
I0802 18:35:13.946931 6 run_pool_controller.go:51] Started CStorPool workers
I0802 18:35:13.968344 6 run_replica_controller.go:47] Starting CStorVolumeReplica workers
I0802 18:35:13.968441 6 run_replica_controller.go:54] Started CStorVolumeReplica workers
I0802 18:35:13.968490 6 run_restore_controller.go:46] Starting CStorRestore workers
I0802 18:35:13.968538 6 run_restore_controller.go:53] Started CStorRestore workers
I0802 18:35:13.968602 6 run_backup_controller.go:46] Starting CStorBackup workers
I0802 18:35:13.968689 6 run_backup_controller.go:53] Started CStorBackup workers
I0802 18:35:43.869876 6 handler.go:456] cStorPool pending: 48d3b2ba-b553-11e9-858e-54e1ad4a9dd4
I0802 18:35:43.869961 6 new_pool_controller.go:160] cStorPool Modify event : cstor-sparse-p8yp, 48d3b2ba-b553-11e9-858e-54e1ad4a9dd4
I0802 18:35:43.870552 6 event.go:221] Event(v1.ObjectReference{Kind:"CStorPool", Namespace:"", Name:"cstor-sparse-p8yp", UID:"48d3b2ba-b553-11e9-858e-54e1ad4a9dd4", APIVersion:"openebs.io/v1alpha1", ResourceVersion:"2070", FieldPath:""}): type: 'Normal' reason: 'Synced' Received Resource modify event
I0802 18:35:44.905633 6 pool.go:93] Import command successful with true dontimport: false importattr: [import -c /tmp/pool1.cache -o cachefile=/tmp/pool1.cache cstor-48d3b2ba- b553-11e9-858e-54e1ad4a9dd4] out:
From the above highlighted logs, we can confirm cstor-pool-mgmt in new pod is communicating with cstor-pool in old pod as first highlighted log says cstor pool found then next highlighted one says pool is really imported.
Possible Reason:
When a cstor pool pod is deleted there are high chances that two cstor pool pods of same pool can present i.e old pool pod will be in Terminating state(which means not all the containers completely terminated) and new pool pod will be in Running state(might be few containers are in running state but not all). In this scenario cstor-pool-mgmt container in new pool pod is communicating with cstor-pool in old pool pod. This can cause CVR resource to set to Invalid.
Note: This issue has observed in all OpenEBS versions upto 1.2.
Resolution:
Edit the Phase of cStorVolumeReplica (cvr) from Invalid to Offline. After few seconds CVR will be Healthy or Degraded state depends on rebuilding progress.
cStor volume become read only state
Application mount point running on cStor volume went into read only state.
Possible Reason:
If cStorVolume is Offline or corresponding target pod is unavailable for more than 120 seconds(iSCSI timeout) then the PV will be mounted as read-only filesystem. For understanding different states of cStor volume, more details can be found here.
Troubleshooting
Check the status of corresponding cStor volume using the following command:
kubectl get cstorvolume -n <openebs_installed_namespace> -l openebs.io/persistent-volume=<PV_NAME>
If cStor volume exists in Healthy or Degraded state then restarting of the application pod alone will bring back cStor volume to RW mode. If cStor volume exists in Offline, reach out to OpenEBS Community for assistance.