TroubleShootingGlobus

From Begrid Wiki
Jump to navigationJump to search

Troubleshooting and FAQ's for Globus

See also:

globus-job-run mysite:2119/jobmanager-fork -queue betest /bin/hostname fails

  • Read /var/log/globus-gatekeeper.log on the CE
    • Possible cause 1: /opt/glite/etc/lcas/timeslots.db
    • Possible cause 2: There is a problem with your DNS: read the note at this page.
    • Possible cause 3: Your certificate and DNS name do not match.

globus-job-run mysite:2119/jobmanager-fork -queue betest /bin/hostname never finishes

When you run "globus-job-run mysite:2119/jobmanager-fork -queue betest /bin/hostname", the command never completes and seems to do nothing.

This program waits for the job to complete, so, the job isn't ran for some reason OR the resource is very busy. Check the following things:

    • pbsnodes -a (for avaible nodes)
    • Use qstat to view the job states.
    • Use checkjob <jobid> to get more info why the job is't ran.
    • Read the logs: /var/spool/pbs/server_logs

PBS_Server;Svr;PBS_Server;Connection refused (111) in contact_sched, Could not contact Scheduler - port 15004

(In the logs files /var/spool/pbs/server_logs)

    • ??? Someone said this error can be ignored, makes sence as it insn't configured by default ???
    • Check if pbs_mom is running: /etc/init.d/pbs_mom status
    • ??? Run ncm-ncd --configure pbsserver ???

Back to Troubleshooting


Template:TracNotice