Wednesday, October 31, 2012

lessons on interrogation

Today we met up with our vendors to resolve a 1-month old problem, SMTP set up. The last part of the saga was when I went around asking people how they configured their SMTP settings (because I haven't done it before), and also googling and reading how to send SMTP commands, only to have people asking me, "why isn't your vendor doing this for you?" "you are the first person I know who helps their vendor do such things" "why are you helping them?"

Simply because I didn't have a choice. The user calls me every 2 hours the past 2 days to ask me how to resolve the issue, saying that all the web forms must work and the vendors are saying that the web forms can't work without SMTP, and she has promised her boss that the site will be live this friday. After I passed all the information I gathered to the guy, he still said he couldn't get it to work. His boss said that this is an environment factor that is beyond their control, and if we are unable to provide them with settings that work, then they are unable to do anything.

The problem sounds simple. The vendor insisted that we need to give them a username and password to access the SMTP server. The SMTP server works on an unauthenticated mode, meaning no username and password required. I had 3 other systems sending emails through the same SMTP server and all did not need to authenticate with a username and password. Eventually, I sent the question to the helpdesk managing the SMTP server, and we are still waiting to solve the mystery.

That aside, I couldn't understand why the web forms couldn't work if we couldn't send emails. The web forms were storing the data submitted into a database, and the email part was to notify the webmaster that someone had just sent a feedback. They had been explaining and explaining for the past 2 weeks, but I couldn't understand until today, when I asked them, if I don't have any email available, how will the error message look like?

It turned out that the error message misled us into thinking that the form wasn't working! OMG The error message was "we encountered a technical difficulty in submitting the form". When I asked what other error messages he had, he said "we encountered a technical difficulty in saving the form". It was then that it suddenly became clear what the problem was. The forms were saving data into the system, but the notification portion wasn't working, so if we just change the "error message" to "thank you", the user who submitted the feedback will not think that it is an error, the feedback is still captured in the system, and the only thing is the webmaster will need to log in to the system to retrieve the data.

You can just imagine the users' reaction when this was reveal to them. What initially was on a critical path of failure turned out to be something not critical, and all we needed was someone to take the problem apart.

Monday, October 29, 2012

really serious troubleshooting business

I was troubleshooting a 5 month old inconsistent and intermittent problem earlier because I had been relying on the vendors and the issue had not reached the fire-on-the-backside stage.

The problem was easily resolved  in 30 minutes after I figured out the 10 lines of shell script that was fetching the files from the ftp server. I would not have been able to write the shell script from scratch, but I guess I have a good translator in my head.

Before that, 2 external and 2 internal data centre server administrators, 5 application vendors, infra manager, and myself, were all unable to resolve the problem. Just 2 weeks ago, the server admin was asking the infra manager why he is taking so long to solve the problem. The server admin was telling the application team that their code was wrong, the files need to reference the root instead of the sub folder. The application team blamed the script for not copying their files properly. I wasn't contributing constructively by chasing the application vendor for status updates as well.

11 people involved. What a joke, all I did was to tell the shell script to read from the sub folder instead of the root, but to be able to troubleshoot the problem, the person needs to understand shell script and html, which sounds like common sense, but I guess the stars were not aligned.