why my cluster queue is not participating in work load balancing

If your cluster queue is not participating in workload balancing, it usually means there’s a misconfiguration or a constraint preventing it from being considered for scheduling jobs. Here are the most common reasons, structured into a checklist format so you can troubleshoot systematically:


✅ 1. Queue is not enabled or not in “RUNNING” state

Check the status of your cluster queue.

  • SLURM: scontrol show queue scontrol show partition
  • PBS/Torque: qstat -Q
  • Solution: Make sure the queue is enabled, started, and running.

✅ 2. Node resources in the queue are not available

If the nodes assigned to the queue are offline, down, or drained, the scheduler won’t assign jobs to that queue.

  • Check node status: sinfo # SLURM pbsnodes -a # PBS
  • Solution: Reboot or re-enable nodes, or investigate hardware/network issues.

✅ 3. Queue resource limits or policies

The queue might be configured with:

  • Too restrictive limits (e.g. only certain users, job types, or small time/memory limits).
  • Max running jobs limit reached.
  • Allowed node count too low.
  • Check configuration:
    • SLURM: scontrol show partition
    • PBS: qmgr -c 'p q <queue_name>'
  • Solution: Adjust limits or policies as needed.

✅ 4. Fair-share or priority issues

Your scheduler might be favoring other queues due to:

  • Fair-share accounting
  • Priority policies
  • Preemption settings
  • Check priority policies:
    • SLURM: scontrol show config | grep Priority
    • PBS: look into sched_config
  • Solution: Adjust scheduler policies or check usage balance.

✅ 5. Queue is excluded from default routing

Sometimes a queue is not part of the default routing logic or job submission defaults.

  • In PBS, check: qmgr -c 'p s' Look for default_queue, routing_queue, etc.
  • In SLURM, make sure users are submitting to that queue/partition explicitly or that it’s in the default queue list.

✅ 6. Job submission filters or policies

Jobs might not be targeting the queue, either because:

  • They are not specifying it.
  • The system or user defaults avoid it.
  • Solution: Try explicitly submitting a job to the queue: sbatch -p <queue_name> myscript.sh # SLURM qsub -q <queue_name> myscript.sh # PBS

✅ 7. Scheduler not aware of updated configuration

If you recently modified queue/node settings, the scheduler may need to be reloaded or restarted.

  • SLURM: scontrol reconfigure
  • PBS: qterm -t quick pbs_server
  • Solution: Reconfigure/restart the scheduler properly.

✅ 8. Queue weights or priorities are too low

In weighted balancing systems, your queue might not be chosen because it has lower weight or priority.

  • SLURM:
    • Check PriorityWeightPartition, PriorityWeightAge, etc.
  • PBS:
    • Look into queue_sort_method, queue_priority.

✅ 9. Scheduler bugs or logs showing errors

Check the scheduler logs (slurmctld.log, sched_logs, server_logs, etc.) for errors or warnings.

  • You might find:
    • Misconfigured nodes.
    • Failing plugins.
    • Licensing/resource reservation conflicts.

If you can tell me which scheduler (SLURM, PBS Pro, Torque, Grid Engine, etc.) you’re using and share some config info (like scontrol show partition or qmgr -c 'p q <queue>'), I can give more tailored advice.

You said:

why my cluster queue is not participating in work load balancing in IBM MQ cluster

ChatGPT said:

If your IBM MQ cluster queue is not participating in workload balancing, the issue is typically due to misconfiguration in the cluster queue setup or attributes that prevent the queue from being selected for load-balanced message delivery.

Let’s go through the key reasons and fixes step by step, specific to IBM MQ clustering:


???? Overview of MQ Cluster Workload Balancing

In an IBM MQ cluster:

  • Messages sent to a cluster queue (defined in multiple queue managers) can be load-balanced across those instances.
  • Workload balancing only happens for non-persistent messages or when using MQOPEN with MQOO_BIND_NOT_FIXED.

✅ 1. Check BIND Options: Must Be BIND_NOT_FIXED

If the sending application or the cluster sender channel uses BIND_ON_OPEN, messages will always go to the same instance.

  • Default is BIND_ON_OPEN, which disables load balancing.
  • You must set BIND_NOT_FIXED in the MQOPEN call or configure it in the application.

✅ Fix:

  • Use MQOO_BIND_NOT_FIXED during MQOPEN.
  • Or configure the application to allow dynamic routing.

✅ 2. Check CLWLUSEQ setting on the queue

This attribute controls whether the queue is eligible for workload balancing.

  • Run: DISPLAY QLOCAL(YOUR.QUEUE.NAME) CLWLUSEQ
  • Possible values:
    • ANY — Queue can participate in workload selection ✅
    • LOCAL — Only use this queue if it’s local ❌

✅ Fix:

ALTER QLOCAL(YOUR.QUEUE.NAME) CLWLUSEQ(ANY)


✅ 3. Check CLWLRANK and CLWLPRTY

These attributes influence queue preference.

  • CLWLRANK: Lower rank preferred.
  • CLWLPRTY: Higher priority preferred.

If one queue instance has a much better combination of rank/priority, others may not get chosen.

✅ Fix:

Make sure these are balanced across all queue managers where the queue is defined.


✅ 4. Queue must be clustered and shared

Check that the queue is:

  • Defined as a cluster queue
  • Shared in the cluster (via DEFBIND and CLUSTER attributes)

✅ Fix:

DISPLAY QLOCAL(YOUR.QUEUE.NAME) CLUSTER DISPLAY QLOCAL(YOUR.QUEUE.NAME) DEFBIND

You should see:

  • CLUSTER(your_cluster_name)
  • DEFBIND(NOTFIXED) or application must override

If not, define the queue as:

DEFINE QLOCAL(YOUR.QUEUE.NAME) CLUSTER(your_cluster_name) CLWLUSEQ(ANY) DEFBIND(NOTFIXED)


✅ 5. Check if queue is PUT-enabled and not full

A queue that:

  • Is not PUT(ENABLED)
  • Is full or has MAXDEPTH exceeded
    Will be skipped during balancing.

✅ Fix:

DISPLAY QLOCAL(YOUR.QUEUE.NAME) CURDEPTH MAXDEPTH PUT

Ensure:

  • PUT(ENABLED)
  • CURDEPTH < MAXDEPTH

✅ 6. Channel CLWL attributes

Check the cluster receiver (CLUSRCVR) and sender (CLUSSDR) channel definitions on all QMs.

  • Run: DISPLAY CHANNEL(TO.QMGR) CLWLPRTY CLWLRANK CLWLWGHT

✅ Fix:

Make sure channels are not skewed in:

  • CLWLPRTY
  • CLWLWGHT
  • CLWLRANK

Set them equally unless intentionally weighted.


✅ 7. Cluster cache not updated or full

The cluster repository might be out of sync, or the full repository doesn’t reflect the current state.

✅ Fix:

  • Run: DISPLAY CLUSQMGR(*) ALL
  • Check if the queue and QMGR show up in the repository correctly.
  • If not, restart channels or refresh cluster membership: REFRESH CLUSTER(your_cluster) REPOS(YES)

✅ 8. Message is persistent and not eligible for load balancing

Workload balancing only applies to non-persistent messages unless the message is opened with MQOO_BIND_NOT_FIXED and DEFBIND(NOTFIXED) is set.

✅ Fix:

  • Ensure message is sent as non-persistent, or
  • Set DEFBIND(NOTFIXED) on the queue
  • Use MQOO_BIND_NOT_FIXED on the open

✅ 9. Application logic overrides MQ behavior

Some apps hardcode the queue manager or queue name or reopen MQ connections repeatedly with BIND_ON_OPEN.

✅ Fix:

  • Review the application logic.
  • Ensure it doesn’t override routing policies.

✅ 10. Check logs for workload exit or routing issues

In rare cases, a custom workload exit or misconfigured exit might cause queues to be excluded.

  • Review MQ error logs: AMQERR01.LOG on all QMs.
  • Look for errors related to routing, clustering, or exits.

???? Example Diagnostic Commands

DISPLAY QLOCAL(CLUSTER.QUEUE.NAME) CLUSTER CLWLUSEQ DEFBIND PUT DISPLAY CHSTATUS(*) WHERE(STATUS EQ RUNNING) DISPLAY CLUSQMGR(*) ALL DISPLAY QCLUSTER(CLUSTER.QUEUE.NAME)


Final Thoughts:

If you’ve confirmed the queue is properly defined in the cluster, eligible via CLWLUSEQ, not full or disabled, and the message binding allows balancing, your queue should participate in workload balancing.

If you’re still stuck, feel free to share:

  • DISPLAY QLOCAL(...) output
  • DISPLAY CHANNEL(...) output
  • Message properties (persistent/non, binding)
  • Cluster setup

If a queue on a queue manager (QMGR) in an IBM MQ cluster is not participating in workload balancing, there are specific conditions and configurations that must be met. Let’s walk through the most likely reasons and how to fix them.


When multiple queue managers in a MQ cluster host the same cluster queue, IBM MQ is capable of distributing (balancing) message traffic across those queues — but only if all rules are satisfied.

If one QMGR’s queue isn’t receiving messages, that usually means it’s being skipped during cluster workload selection.


✅ 1. Queue is not advertised in the cluster

Check if the queue on that QMGR is even visible to the cluster.

???? Check:

DISPLAY QLOCAL(QUEUE.NAME) CLUSTER

✅ Must show:

CLUSTER(cluster-name)

If not:

ALTER QLOCAL(QUEUE.NAME) CLUSTER(cluster-name)


✅ 2. Queue is not using CLWLUSEQ(ANY)

This attribute controls whether the queue is eligible for workload selection.

???? Check:

DISPLAY QLOCAL(QUEUE.NAME) CLWLUSEQ

✅ Must be:

CLWLUSEQ(ANY)

If it’s LOCAL, the queue will only be used if it’s local to the sender.

Fix:

ALTER QLOCAL(QUEUE.NAME) CLWLUSEQ(ANY)


✅ 3. DEFBIND must allow dynamic routing

If DEFBIND is set to OPEN, and the application uses BIND_ON_OPEN, it will stick to a single QMGR/queue forever.

???? Check:

DISPLAY QLOCAL(QUEUE.NAME) DEFBIND

✅ Should be:

DEFBIND(NOTFIXED)

If not, update it:

ALTER QLOCAL(QUEUE.NAME) DEFBIND(NOTFIXED)

Or change your application to use MQOO_BIND_NOT_FIXED when opening the queue.


✅ 4. Cluster channels and rankings are unbalanced

If other QMGRs have more favorable workload attributes, your QMGR may be consistently skipped.

???? Check for this QMGR:

DISPLAY CHANNEL(TO.QMGR) CLWLPRTY CLWLWGHT CLWLRANK

✅ Best practice:

  • Equal CLWLPRTY (priority)
  • Equal or adjusted CLWLWGHT (weight)
  • Equal or reasonable CLWLRANK (rank)

If this QMGR has:

  • Low weight, or
  • High rank, or
  • Low priority,

…it may never get picked.


✅ 5. QMGR not known in the cluster repository

If the QMGR or its queue isn’t visible in the full repository, it’s excluded from routing decisions.

???? Check:

From the full repository queue manager, run:

DISPLAY CLUSQMGR(QMGRNAME) ALL

If it’s not shown, or incomplete:

Fix:

On the non-participating QMGR:

REFRESH CLUSTER(cluster-name) REPOS(NO) FORCE

Check channel status:

DISPLAY CHSTATUS(*) WHERE(STATUS EQ RUNNING)

Make sure cluster channels are active and healthy.


✅ 6. The queue is full, paused, or not PUT-enabled

Even if the queue is correctly configured, it can still be excluded if it’s not available for puts.

???? Check:

DISPLAY QLOCAL(QUEUE.NAME) CURDEPTH MAXDEPTH PUT

Make sure:

  • PUT(ENABLED)
  • CURDEPTH < MAXDEPTH

✅ 7. Messages are persistent and routed statically

By default, only non-persistent messages are load-balanced. Persistent messages stick to the same queue unless special settings are used.

???? Fix:

To balance persistent messages, ensure:

  • Queue uses DEFBIND(NOTFIXED)
  • Application uses MQOO_BIND_NOT_FIXED

Persistent messages + BIND_ON_OPEN = stuck to one QMGR forever.


✅ 8. Custom workload exit or MQ exit interferes

If there’s a workload exit or any MQ exit in use, it could override IBM MQ’s default routing behavior.

???? Fix:

  • Check for exits in qm.ini and mqs.ini.
  • Unless you’re customizing routing, remove/disable workload exits.

✅ 9. Application is hardcoding QMGR or queue

Some apps may:

  • Connect directly to a specific QMGR
  • Use fixed queues
  • Avoid dynamic binding

???? Fix:

  • Use a cluster alias or cluster workload balancing connection
  • Avoid specifying a target QMGR name unless necessary
  • Let the cluster decide based on routing rules

✅ 10. QMGR doesn’t host the queue at all

Double-check that the queue is actually defined on the QMGR in question.

DISPLAY QLOCAL(QUEUE.NAME)

If you get:

AMQ4043: Queue QUEUE.NAME does not exist.

…then the queue doesn’t exist on this QMGR — so it can’t participate.


???? Summary: Diagnostic Commands Cheat Sheet

Run these on the QMGR that isn’t participating:

DISPLAY QLOCAL(QUEUE.NAME) CLUSTER CLWLUSEQ DEFBIND PUT CURDEPTH MAXDEPTH DISPLAY CHANNEL(*) CLWLPRTY CLWLWGHT CLWLRANK DISPLAY CLUSQMGR(*) ALL DISPLAY QCLUSTER(QUEUE.NAME) DISPLAY CHSTATUS(*) WHERE(STATUS EQ RUNNING)


❓Still not working?

If you’ve checked all of the above and it’s still not participating, send me:

  • DISPLAY QLOCAL(QUEUE.NAME) ALL
  • DISPLAY CHANNEL(*) CLWLPRTY CLWLWGHT CLWLRANK
  • A description of how messages are being sent (from which app/QMGR)

…and I’ll help pinpoint the issue.


Let me know what you find and I’ll help you dig deeper if needed.

You said:

why qmgr suspended is not an option to check

ChatGPT said:

In IBM MQ clustering, queue manager suspension is an important mechanism to control whether a queue manager participates in cluster workload balancing. If you’re asking:

“Why is QMGR SUSPENDED not an option to check?”

Here’s what that might mean — and what you should be checking instead.


✅ What does “QMGR SUSPENDED” mean in MQ?

When a queue manager is suspended from the cluster, it:

  • Remains a cluster member
  • But does not receive new messages as part of cluster workload balancing
  • It’s useful for maintenance or draining load from a QMGR

???? Why you’re not seeing “SUSPENDED” status:

There are a few reasons this could happen:


1. ❌ You’re looking in the wrong place

SUSPEND status is not shown in DISPLAY QLOCAL(...) or DISPLAY QCLUSTER(...).

Instead, you need to run:

DISPLAY CLUSQMGR(QMGRNAME) SUSPEND

This will return something like:

SUSPEND(YES)

or

SUSPEND(NO)


2. ❌ You’re expecting “SUSPENDED” to show up in queue diagnostics

There’s no attribute on a queue that says “this is suspended.”
Suspension is a QMGR-level state, not a queue-level one.

So checking the queue won’t tell you if the QMGR is suspended from the cluster.


3. ✅ The queue manager might not be suspended at all

If the QMGR isn’t suspended, you won’t see any indication of it — because this is a deliberate, manual action.

To suspend a QMGR:

SUSPEND QMGR

To check if it’s suspended:

DISPLAY CLUSQMGR(QMGRNAME) SUSPEND


???? Example: Check suspension status

On a full repository QMGR:

DISPLAY CLUSQMGR(YOUR.QMGR.NAME) SUSPEND

Expected result:

AMQ8441I: Display Cluster Queue Manager details. CLUSQMGR(YOUR.QMGR.NAME) SUSPEND(YES)


✅ How to resume the queue manager

If a QMGR was suspended and you want it back in the workload balancing:

RESUME QMGR

Then re-check:

DISPLAY CLUSQMGR(QMGRNAME) SUSPEND

Should return:

SUSPEND(NO)


???? If you don’t see the SUSPEND attribute at all:

Make sure:

  • You’re querying the right queue manager
  • The QMGR is actually in the cluster (DISPLAY CLUSQMGR(*))
  • You’re on a version of MQ that supports suspension (any MQ v7.0+ supports it)

???? Summary

What You WantHow to Check
Is the QMGR suspended from cluster balancing?DISPLAY CLUSQMGR(QMGRNAME) SUSPEND
How to suspend a QMGR?SUSPEND QMGR
How to resume a QMGR?RESUME QMGR
Why isn’t this shown on a queue?Because suspension is a QMGR-level setting, not a queue-level one.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *