Hello,
We're trying to populate a Shinken installation with nagios host and service objects, and all was going swimmingly until we ran into puppet trying to export non-existent objects from puppetdb. We were not cleaning decommissioned nodes from the puppet installation properly, and it's come back to bite us. We'd like to figure out how to clean everything up if possible without nuking the DB and re-installing puppet on every machine.
Longer description:
We're running PE 2015.2.2, and initially we had stale nodes never being purged from puppetdb. Configuration sample below from database.ini:
# Number of seconds before any SQL query is considered 'slow'; offending
# queries will not be interrupted, but will be logged at the WARN log level.
log-slow-statements = 10
node-ttl = 7d
node-purge-ttl = 0s
report-ttl = 14d
Sanitized puppet run output:
root@shinken:~# puppet agent -t
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Invalid relationship: Nagios_host[test-app01.int.domain.com] { before => Nagios_service[test-qa-app02_check_load] }, because Nagios_service[test-qa-app02_check_load] doesn't seem to be in the catalog
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
root@shinken:~#
The tricky part of this is that neither test-app01 nor test-qa-app02 existed any longer. So we tried changing the database.ini to the following (via changing the variable in the PE console):
# Number of seconds before any SQL query is considered 'slow'; offending
# queries will not be interrupted, but will be logged at the WARN log level.
log-slow-statements = 10
node-ttl = 7d
node-purge-ttl = 1d
report-ttl = 14d
This change has been accepted, but now the only thing that has changed is that different non-existent servers appeared. I tried playing whack-a-mole, doing things like:
root@pe-master:~# puppet node deactivate test-app01.int.domain.com
Submitted 'deactivate node' for test-app01.int.domain.com with UUID 8f3a9297-32a7-4fb5-b0a3-67f2f661041b
root@pe-master:~# puppet cert clean test-app01.int.domain.com
Notice: Revoked certificate with serial 373
Notice: Removing file Puppet::SSL::Certificate test-app01.int.domain.com at '/etc/puppetlabs/puppet/ssl/ca/signed/test-app01.int.domain.com.pem'
Notice: Removing file Puppet::SSL::Certificate test-app01.int.domain.com at '/etc/puppetlabs/puppet/ssl/certs/test-app01.int.domain.com.pem'
root@pe-master:~# puppet node deactivate test-qa-app02.int.domain.com
Submitted 'deactivate node' for test-qa-app02.int.domain.com with UUID 550f505c-d38f-4354-8c57-aed7c29b739c
root@pe-master:~# puppet cert clean test-qa-app02.int.domain.com
Notice: Revoked certificate with serial 322
root@pe-master:~#
This is not ending the behavior, and the service check references don't seem to be expiring from the DB. Any suggestions on how to clean this mess up?
Thanks in advance!
↧