[Blink] DNS error

Bryan Miller bmill at socialserve.com
Mon Dec 10 23:09:49 CET 2012


> Date: Fri, 30 Nov 2012 17:13:22 +0100
> From: Adrian Georgescu <ag at ag-projects.com>
> To: Blink Support Forum <blink at lists.ag-projects.com>
> Subject: Re: [Blink] DNS Error
> Message-ID: <A221BB4F-5AF4-4E7A-950C-FB51A65711D3 at ag-projects.com>
> Content-Type: text/plain; charset=us-ascii
> 
> Hi Bryan,
> 
> This is not a known issue, it may be a software bug if you say that DNS lookups work fine.
> 
> Some logs may be of help when this happens, turn on Notifications trace to file in advanced logs preferences and send them back to us, hopefully we can spot the reason. Also send the result of dig command for the same address on the same computer when the problem occur.
> 
> Regards,
> Adrian

I spent some time Friday trying to dig deeper into this problem, as another user was experiencing this problem.

It seems that there may be a bug in the OS X app firewall. This is occurring on a mac mini's, and their default config is to have the Firewall enabled and signed apps allow connections. 

You can reproduce this error by using dig or nslookup and running repeatedly in quick succession, at least it can be done on multiple machines. They're running 10.7.4. I have yet to try this on 10.7.5. Blink version is 1.9.1.
ProductName:	Mac OS X
ProductVersion:	10.7.4
BuildVersion:	11E53


It will randomly timeout on a query. Looking at a tcpdump, you can see a response packet that isn't received by nslookup or dig. After the timeout, dig or nslookup sends another query and gets a response.

So, the problem is that when you first start up blink, it sends out multiple DNS queries to attempt auto configuration of the sip account:

2012-12-10 16:25:04.155847 [Blink Pro 10449]: DNS lookup NAPTR production-pbx.socialserve.com failed: DNS response contains no answer
2012-12-10 16:25:04.158674 [Blink Pro 10449]: DNS lookup NAPTR production-pbx.socialserve.com failed: DNS response contains no answer
2012-12-10 16:25:04.169987 [Blink Pro 10449]: DNS lookup SRV _stun._udp.production-pbx.socialserve.com failed: DNS record does not exist
2012-12-10 16:25:04.175093 [Blink Pro 10449]: DNS lookup SRV _sips._tcp.production-pbx.socialserve.com failed: DNS record does not exist
2012-12-10 16:25:04.177892 [Blink Pro 10449]: DNS lookup SRV _sips._tcp.production-pbx.socialserve.com failed: DNS record does not exist
2012-12-10 16:25:04.187973 [Blink Pro 10449]: DNS lookup SRV _sip._tcp.production-pbx.socialserve.com failed: DNS record does not exist
2012-12-10 16:25:04.190931 [Blink Pro 10449]: DNS lookup SRV _sip._tcp.production-pbx.socialserve.com failed: DNS record does not exist
2012-12-10 16:25:04.204756 [Blink Pro 10449]: DNS lookup SRV _sip._udp.production-pbx.socialserve.com failed: DNS record does not exist
2012-12-10 16:25:04.206910 [Blink Pro 10449]: DNS lookup SRV _sip._udp.production-pbx.socialserve.com failed: DNS record does not exist
2012-12-10 16:25:04.224022 [Blink Pro 10449]: DNS lookup A production-pbx.socialserve.com succeeded, ttl=86400: 10.0.5.40
2012-12-10 16:25:04.226788 [Blink Pro 10449]: DNS lookup SRV _stun._udp.production-pbx.socialserve.com failed: DNS record does not exist
2012-12-10 16:25:04.230575 [Blink Pro 10449]: DNS lookup A production-pbx.socialserve.com succeeded, ttl=86400: 10.0.5.40


There is a 50/50 chance of one of these queries running into the timeout related bug. For the machines that do experience the timeout, Blink then decides to start using 8.8.8.8 and 8.8.4.4 as the dns servers. Due to external DNS servers not having a resolution for our internal asterisk server, blink fails to register, and it is unable to place or receive calls. Some instances, it can receive calls, but is unable to place calls (due to failed registration). It sends out multiple DNS queries every minute for auto configuration of the sip account.

Here is a log entry for the Blink attempting to use google's DNS servers:

query_type=SRV, nameservers=['8.8.8.8', '8.8.4.4'], uri=sip:production-pbx.socialserve.com, query_name=_sips._tcp.production-pbx.socialserve.com, context=lookup_sip_proxy, error=, answer=None

Blink has to be restarted in order to rectify the problem, but it sometimes will run into the same problem in the future after being restarted.
This problem does not exist when I turn off the OS X Firewall in system preferences.

A temporary fix is adding the NAPTR and SRV entries to our DNS. This reduces the DNS queries down to three lookups. 

The other fix would be to configure all clients to use a primary proxy with the ip address. However, going to each machine and putting this in manually is rather tedious for the amount of workstations we have, so the DNS entries is easier.

Outside of this, I noticed that blink continually attempts to get stun configuration by doing a DNS lookup:
2012-12-10 16:44:04.919934: Notification name=DNSLookupTrace sender=<sipsimple.lookup.DNSLookup object at 0x6914950> data=T
imestampedNotificationData(service='stun', nameservers=['8.8.8.8', '8.8.4.4'], timestamp=datetime.datetime(2012, 12, 10, 16
, 44, 4, 905342), error=NXDOMAIN(), uri=SIPURI('production-pbx.socialserve.com', None, None, None, False, {}, {}), query_na
me='_stun._udp.production-pbx.socialserve.com', context='lookup_service', query_type='SRV', answer=None)
It also does this every minute.

If you put 0.0.0.0 in the stun server settings, it will fail and never attempt to get stun configuration every minute. I am concerned about whether this will cause problems for the app?
Based on the log entries when I do this, it sets the nat_type='Blocked' and then never lookups up stun info:

2012-12-07 16:15:21.212000 [Blink Pro 7875] (5) stuntsx0x17f2f: STUN client transaction created
2012-12-07 16:15:21.212000 [Blink Pro 7875] (5) stuntsx0x17f2f: STUN sending message (transmit count=1)
2012-12-07 16:15:21.246000 [Blink Pro 7875] (5) stuntsx0x99053: STUN sending message (transmit count=2)
2012-12-07 16:15:21.314000 [Blink Pro 7875] (5) stuntsx0x17f2f: STUN sending message (transmit count=2)
2012-12-07 16:15:21.414000 [Blink Pro 7875] (5) stuntsx0x13927: STUN sending message (transmit count=3)
2012-12-07 16:15:21.447000 [Blink Pro 7875] (5) stuntsx0x99053: STUN sending message (transmit count=3)
2012-12-07 16:15:21.516000 [Blink Pro 7875] (5) stuntsx0x17f2f: STUN sending message (transmit count=3)
2012-12-07 16:15:21.815000 [Blink Pro 7875] (5) stuntsx0x13927: STUN sending message (transmit count=4)
2012-12-07 16:15:21.849000 [Blink Pro 7875] (5) stuntsx0x99053: STUN sending message (transmit count=4)
2012-12-07 16:15:21.918000 [Blink Pro 7875] (5) stuntsx0x17f2f: STUN sending message (transmit count=4)
2012-12-07 16:15:22.616000 [Blink Pro 7875] (5) stuntsx0x13927: STUN sending message (transmit count=5)
2012-12-07 16:15:22.652000 [Blink Pro 7875] (5) stuntsx0x99053: STUN sending message (transmit count=5)
2012-12-07 16:15:22.719000 [Blink Pro 7875] (5) stuntsx0x17f2f: STUN sending message (transmit count=5)
2012-12-07 16:15:24.217000 [Blink Pro 7875] (5) stuntsx0x13927: STUN sending message (transmit count=6)
2012-12-07 16:15:24.255000 [Blink Pro 7875] (5) stuntsx0x99053: STUN sending message (transmit count=6)
2012-12-07 16:15:24.321000 [Blink Pro 7875] (5) stuntsx0x17f2f: STUN sending message (transmit count=6)
2012-12-07 16:15:27.418000 [Blink Pro 7875] (5) stuntsx0x13927: STUN sending message (transmit count=7)
2012-12-07 16:15:27.457000 [Blink Pro 7875] (5) stuntsx0x99053: STUN sending message (transmit count=7)
2012-12-07 16:15:27.523000 [Blink Pro 7875] (5) stuntsx0x17f2f: STUN sending message (transmit count=7)
2012-12-07 16:15:29.019000 [Blink Pro 7875] (4) stuntsx0x13927: STUN timeout waiting for response
2012-12-07 16:15:29.019000 [Blink Pro 7875] (5) natck0x17f2200: Completed Test I: Binding request, status=370004
2012-12-07 16:15:29.060000 [Blink Pro 7875] (4) stuntsx0x99053: STUN timeout waiting for response
2012-12-07 16:15:29.060000 [Blink Pro 7875] (5) natck0x17f2200: Completed Test II: Binding request with change address and port request, status=370004
2012-12-07 16:15:29.125000 [Blink Pro 7875] (4) stuntsx0x17f2f: STUN timeout waiting for response
2012-12-07 16:15:29.125000 [Blink Pro 7875] (5) natck0x17f2200: Completed Test III: Binding request with change port request, status=370004
2012-12-07 16:15:29.130000 [Blink Pro 7875] (5) stuntsx0x17f2f: STUN client transaction destroyed
2012-12-07 16:15:29.130000 [Blink Pro 7875] (5) stuntsx0x99053: STUN client transaction destroyed
2012-12-07 16:15:29.130000 [Blink Pro 7875] (5) stuntsx0x13927: STUN client transaction destroyed
notifications_trace.txt:2012-12-07 16:15:29.146082: Notification name=SIPEngineDetectedNATType sender=<Engine(Thread-1, started daemon -1336856576)> data=NotificationData(nat_type='Blocked', timestamp=datetime.datetime(2012, 12, 7, 16, 15, 29, 125997), user_data=1, succeeded=True)

Can you add in the advanced configuration an option to prevent stun lookup all together, or change the behavior to not continually lookup stun configuration, by default?



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ag-projects.com/pipermail/blink/attachments/20121210/1a126dc5/attachment.html>


More information about the Blink mailing list