The Microsoft Interview Demystified

So Many people have asked me about my experience and how to prepare for the Microsoft Interview that I decided to put up a writeup here to help whoever might need it. This is ancient (Summer of 2005, so I’m guessing that a lot of things must have changed since then).

Well the first two things that you need for the M$ <pronounced madollar> interviews is going through two books. The first one is Programming interviews Exposed and the second one is How to move mount Fiji. You should read the first one cover to cover because it contains really valuable tips. If you dont have em, I’ll email you e-copies.I think that Programming interviews Exposed wast the most useful resource for the Madollar interview.

One big trap is GPA. I was criticized on this point by every interviewer. I’m not sure about your’s. I just told them that it was a phase in my life. You’ll have around 20 mins with each interviewer so its better to prove your mettle using up the rest of the time rather than defending something that cant be defended.

Anyways they might ask you about your projects so it is good to review your CV and write a short description about each project and what were the key points in your implementation. It’ll help you recall everything at that moment. Not sure about you but I have a really weak memory.

Do not PANIC. Think ALOUD.<<– Its human nature to correct someone who is thinking aloud even if he doesn’t even have that intention.

TAKE YOUR TIME if you think that you can solve a problem. Never feel shy to ask questions or HINTS. Sometimes the most stupidest and most obvious answers are what they are looking for.

They might ask questions that might not even have an answer. So be prepared.

An interview is just like a conversation so you can also mold its direction. Try taking it into a direction that you’re strong in:-

(My example)

-> This interviewer had a rubiks cube on his desk

<ME> Aint that a rubiks cube

<M$ man> Well yes it is

<ME> We had an AI assignment on it.

<M$ man> So you’re interested in AI

<ME> Pretty much.

// Rest of the interview was about AI

The opportunities will jump up by themselves.

Your speech should be audible. Most people reduce the level of their voice when they’re nervous. Thank god its the opposite for me. An interviewer even told me at the end not to speak so loud.

In the end they’ll ask if you have any questions, I asked them about feedback of my interviews and how life at Microsoft is and yada yada yada. At a personal level this is a good chance to make them think they know you.

Dont give up. I screwed one question in a interview when I took too long to answer a simple problem and after that I was like, thank you very much and the interviewer told me that its not over. Had I given up hope, I would not have gotten through.

Get yourself to believe that the real point of your going to Dubai is not for the Madollar interviews but actually for shopping. It helps with tension in case you experience it.

ON THE DAY OF THE INTERVIEW:-

Have a good nights sleep

Dont study anything

Dont drink tea or coffee <– This tip is from my brother. They make you nervous

Half an hour before the interview dont do anything else but think about the interview. I have a mind that keeps wandering around <<– This tip is from me dad and it really helps.

AFTER the INTERVIEW:-

Dont forget to check out the Mall of the Emirates

Hope some of this stuff helps

PS: I no longer work for Microsoft

Elastic Search Benchmark

Tags

, , ,

Introduction

I needed to run a diagnostic test to see if I could use Elastic Search on the products that I’m working on currently, so I’m putting up this brief resource on my experience with Elastic Search for the purposes of testing. The scripts that I used are available at the end of this article for download.

Machine Configuration:-

Host System Mac Running Elastic Search
Guest System Virtual Box networked via Bridged Networking

Test Data Set & Elastic Search Configuration:-

One Million Indexes Written using two concurrent writers. One Writing index (zero to five hundred thousand). Second Writing index(five hundred thousand to one million)
Indexes set with a duplicate limit 1 i.e. feed each other with your data, multimaster configuration
Tool written in Python to Insert Mock Data Containing Searchable information ID, Numeral Text etc.

Assumptions:

  • Test Tool written as a wrapper over curl jusing JSON api. Not intended for blazing fast speed. Things can be faster via native library and even more so with native Java protocol, but explicit is alway better than implicit and premature optimization is the root of all evil.
  • Version of Elastic Search Used is 0.17.8

Scenarios Tested:

  • Basic Replicating Scenario. Stats Attached in Excel File. I would love to run anyone through this.

Everything went Smooth. No Issues Encountered. Please See Attached xlsx file for results

  • Virtual Machine Powered off and restarted and relatched into Cluster

Everything resynced fine and very fast. No issues Encountered.

Other Observations:-

Approx 50 ms querying times on string search exact with concurrent index being manipulated by two concurrent threads writing to index.
Approx 150 ms querying times on search containing substring and 3 OR’s being manipulated by two concurrent threads writing to index.

Other Information:-

http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-elasticsearch/
Highlights SOLR’s deficiency when it comes to realtime writes and reads decreasing performance of single Index.

http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-searchengine-berlinbuzzwords.html
Useful Video that highlights Elastic Search Internals.

Appendix:

Test Tool Command Line Options:

Usage: writer.py [options]

Options:
  -h, --help            show this help message and exit
  -o HOST, --host=HOST  ip/hostname of es instance e.g.
                        stor.mystormachine.com, 192.168.0.3, localhost, etc
  -p PORT, --port=PORT  port of stor instance use 9200 if unsure
  -n NAME, --name=NAME  specifies namespace e.g. test
  -l LOWER, --lower=LOWER
                        specifies lower number of range
  -u UPPER, --upper=UPPER
                        specifies upper number of range
  -t TEST, --test=TEST  true means to run command, ignore means ignore

Links to Scripts / Utilities from this post

http://178-77-103-161.kundenadmin.hosteurope.de:3000/attachments/9/IndexWriterCurlWrapperScript.zip

http://178-77-103-161.kundenadmin.hosteurope.de:3000/attachments/10/elasticsearch.yml

http://178-77-103-161.kundenadmin.hosteurope.de:3000/attachments/11/SummaryIndex.xlsx

P.S. I’m copying this from my own wiki, so if something is formatted to look overly dramatic, please forgive me 🙂

Linux: Limiting Resident Memory

Tags

, , , , , ,

These were some approaches that I came across while trying to limit Resident Memory Consumption for software on our Deployment Boxes. 

Setting Virtual Memory

#Sets Virtual size of the program. Eventual termination condition
ulimit -v <size_in_kb>

Setting Real Memory / RSS

Via limits.conf

In /etc/security/limits.conf :-

RSS can be set but this does not work on Ubuntu:-
Bug Report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/701141

Via UpStart

http://upstart.ubuntu.com/wiki/Stanzas#limit
http://manpages.ubuntu.com/manpages/maverick/man5/init.5.html

Via cgroup

You can accomplish this using cgroups. The short version (tested on Ubuntu 11.04) is:
Install the cgroup-bin package.
Edit /etc/cgconfig.config and create a group with limited memory. For instance, I added:

group limited {
  memory {
    memory.limit_in_bytes = 50M;
  }
}

Run

$ sudo restart cgconfig
$ sudo chown -R jlebar /sys/fs/cgroup/memory/limited
$ cgexec -g memory:limited your/program

I observed my process with an RSS of 93M when I asked it to use only 50M, but that wasn’t a problem for me, since my goal was just to get the program to page.

cgclassify theoretically lets you attach restrictions to a running process, but it didn’t appear to restrict RSS at all.

Reference : http://stackoverflow.com/questions/3043709/resident-set-size-rss-limit-has-no-effect
Reference Long : http://jlebar.com/2011/6/15/Limiting_the_amount_of_RAM_a_program_can_use.html

This was originally compiled from various sources on a Redmine wiki. All references have been included.

Kaltura Virtual Image

Tags

, , , ,

In my Experience, Kaltura’s installation can be daunting especially if you’re not on a CentOS based distro. However kudos to the Kaltura team because they have hosted a virutal image on their website with a good tutorial containing the steps to get it up and running: http://blog.kaltura.org/how-to-setup-kaltura-ce-4-0-vmware-image-in-15-minutes This is fine for some because VMPlayer is free, however only on windows. In order to get a dev instance up on my Mac, I setup an image from scratch after trying to find one on the internet. I’ve hosted it here. (Unfortunately I dont have the said image anymore. Sorry about the inconvinience)  Just drag it into VirtualBox and all will be fine. For the uninitiated Virtual Box can itself be found at http://virtualbox.org/ And a few notes:-

  • System Credentials: kaltura / kaltura
  • MySQL Login: root / toor
  • Admin Login for web is admin@kaltura.local / nimda . From within this you can create new users.
  • Instance is currently set to Bridge Mode Networking. You’ll have to add kaltura.local and IP of Virtal Box Machine in the Hosts File.

Sighup & Python

Tags

I wrote a python processing daemon on a machine, that was supposed to run indefinitely which used to terminate mysteriously by itself. It was later traced to dying on logout, because I generally leave a machine terminal logged into the remote machine.

Later on, I checked if it was the notorious SIGHUP, a hangup signal issued to all processes when the user logs off.

In order to ascertain this, I used the following line to send a sighup signal to the process


sudo kill -SIGHUP 2025

You can get a list of process details(also containing the process id’s)


ps aux | grep python mydaemon.py

Now there are multiple ways of how to handle this:-

1) Using NOHUP:


sudo nohup python mydaemon.py

The problem with this approach is that I was calling system commands from my code which themselves were placed in /usr/bin. After using nohup to run the daemon, this access was restricted and my code threw exceptions saying that the commands were not found.

2) Using upstart:

This is a Ubuntu based daemon manager. It has all these features out of the box, sadly I havent had the time to use it. Its available on Redhat and other distros of linux as well.

3) Python:

Implementing a signal handler. This is by far the most convinient.


import signal
signal.signal(signal.SIG_HUP, signal.SIG_IGN)

SIG_IGN is a default signal handler that will do nothing. You could alternatively implement your own handler for various OS Signals in the following way:-


def handler(signum, frame):
print 'Did anyone HICHUP ?'
signal.signal(signal.SIG_HUP, signal.SIG_IGN)

If anyone has any better more professional ways of doing this, do let me know.

SVN 1.4.3 to 1.4.6 upgrade issue

Tags

One of our old faithful deployment machines at work got reinstalled and we upgraded SVN from version 1.4.3 to 1.4.6

However the data was incompatible, so we had to run the following commands in order:-

for every folder:-

Take a dump of that path and redirect the output to a temporary path and preserve the relative heirarchy.
Delete the Files under the Path
ReCreate the following frompath which just got deleted
Load this frompath with data from the temporary File Path

And thats it. It can all be done using svncopy.py Since I dont have an online CVS or SVN account, I’ll paste the code here.

I recommend you to look at the code and comment out the lines that you want to comment out. I bear no responsiblity in case anything goes wrong. Please backup the folder if you fear data loss.

"""
Disclaimer: The following code is being released to the public with no strings attached.
However I do not take responsiblity in any way for the side effects of using this code.

Contributed by : Afrobeard
"""

import os;
import dircache;
from optparse import OptionParser;

def fixpath(path):
path = path.replace('\\', '/').replace('//', '/');
path = path.replace('\\', '/').replace('//', '/');

if path == '.':
pass;
elif path == './':
path = '.';
elif path.startswith('./'):
path = path[2:]
if path.startswith('/'):
path = path[1:];
if path.endswith('/'):
path = path[:-1];
return path;

def runcmd(cmd):
print cmd
os.system(cmd)

parser = OptionParser()
parser.add_option("-f", "--from", dest="frompath",
help="From Path. Will be deleted later on")
parser.add_option("-t", "--to", dest="topath",
help="To Path.")

(options, args) = parser.parse_args()

options.frompath = fixpath(options.frompath)
options.topath = fixpath(options.topath)

os.chdir(options.frompath)

dumpcmd = 'svnadmin dump "'+options.frompath+'/%s" > "'+options.topath+'/%s"'
rmcmd = 'sudo rm -Rf "%s"'
createcmd = 'svnadmin create "' + options.frompath + '/%s"'
loadcmd = 'svnadmin load"' + options.frompath + '/%s" 2:
pass;
elif len(os.path.join(root,name).split('/')) > 2:
pass;
else:
print os.path.join(root,name)
print name

#1. Taking the DUMP
runcmd(dumpcmd % (name, name))

#2. Removing the from folder
runcmd(rmcmd % name)

#3. Creating the Path into the repository
runcmd(createcmd % name)

#4. Loading contents
runcmd(loadcmd % (name, name))

PyUnicodeUCS2_DecodeUTF8

Tags

, , , , , ,

The old centos 4 installation at work had been running fine, until the day I accidentally deleted python2.3 off of it. I needed to install python2.5 and didnt think this would cause any problems. However I ended up breaking yum [Our friendly package manager].

Quite a few people tried to fix this by installing 2.3 and although I cant give a detailed series of actions that led to this, but we were getting problems with PyUnicodeUCS2_DecodeUTF8 or more specifically


undefined symbol: PyUnicodeUCS2_DecodeUTF8

A little research into te problem indicated that its because of a representation issue for Unicode. The modules in question are using 4 bytes to represent a unicode whereas my Version of Python was using 2.

http://effbot.org/pyfaq/when-importing-module-x-why-do-i-get-undefined-symbol-pyunicodeucs2.htm

Quotes that:-

The only way to solve this problem is to use extension modules compiled with a Python binary built using the same size for Unicode characters.

However I was getting this inside pythons own libraries.

Therefore I Went to http://rpm.pbone.net and Went to Advanced rpm search, selected CentOS 4, and got rpm-python for the selected platform.

I Downloaded the package via wget and then ran the following command to unpack the rpm file:-


rpm2cpio python-2.3.4-14.4.i386.rpm | cpio -idmv

This made a heirarchy inside my working folder starting with the relative path usr.

After that, I copied everything from here to /usr/lib/python2.3 [or the Lib folder for python.]

I fired up yum and Everything worked like a breeze

[x for x in list while condition]. Alternatives for Python

Tags

, , ,

Recently I came across the following comp.lang.python usenet thread. urikaluzhny posted the following question:-

It seems that I rather frequently need a list or iterator of the form [x for x in <list> while <condition>]
And there is no one like this.
May be there is another short way to write it (not as a loop).

I think Paul Rubin gave a reasonable answer in which he said to use itertools.takewhile(condition, seq)

Other solutions were bigger ones i.e. the long way to write it.

I was thinking of the shortest way to do this, using pure python constructs. i.e without importing anything addional and I came up with the following:-


l = range(6)
1 if b!=4 else l.__delslice__(0,len(l)) for b in l][:-1]

I was wondering if anyone could come up with something more concise no matter how complex. Feel free to respond on the original thread or here[In which case I’ll shift the answers I consider reasonable onto the original thread using your name of course]. Or you could feel free to post at both places.

Disclaimer:-

  1. The following proposed solution is not intended to be a solution, it goes completely against the zen of python. [Type import this into the python command interpreter]
  2. It is strongly undesireable for us humans to use anything that starts with __

Are your PDF’s not being Rendered Correctly

Tags

, , , , , ,

Its probably because of a bad PDF definition. A bad PDF definition constitutes of but is not totally comprised of Font embedding. Other errors can consist of messed up gamma values, etc which may result in the pictures looking different when viewed with different PDF Readers but lets not dwelve into those at the moment.

So how exactly is one supposed to embed fonts into an Acrobat pdf. If you’re using Adobe Acrobat Distiller, instructions can be found here. In case you’re using GhostScript, you can find help here. It can be implied that the types of font one should use should be embeddable, e.g. TrueType or Type1.

At this point you may be wondering about what to do if you have a document that does not have fonts embedded into it. Generally PDF Viewers can be supplied with font file paths so that they can pick font definitions from there. So you just need a comprehensive collection of fonts on your system.

Where does one get a collection of fonts to use. Printers use postscript to define documents and they contain a collection of common fonts to render page output correctly. I extracted some fonts out of some Lexmark printer drivers once. I’ve uploaded them and they can be downloaded here.

In ghostscript, the font file directory can be specified as follows[following extract copied blatantly from here]:-

1) Symbolic link to the font directory as the expected dir for example:

ln -s /usr/local/share/ghostscript/fonts app/ghostscript-8.00/share/ghostscript/fonts

2) Tell gs where to find the fonts using the -I option to add a search path. For example:

gs -I/usr/local/share/ghostscript/fonts …

3) Set the GS_LIB environment variable to specify the dir:

export GS_LIB=/usr/local/share/ghostscript/fonts

Please let me know in the comments if any of these links particularly to the compressed fonts file dies.

Unicode & MySQLdb

Tags

, , , ,

I dont like giving cursory descriptions about problems, but I got plagued with the following error message when using xmppp to send messages after database lookups:-


File "D:\Python25\Lib\site-packages\xmpp\protocol.py", line 418, in __init__
if body: self.setBody(body)
File "D:\Python25\Lib\site-packages\xmpp\protocol.py", line 431, in setBody
self.setTagData('body',val)
File "D:\Python25\Lib\site-packages\xmpp\simplexml.py", line 243, in setTagDat
a
except: self.addChild(tag,attrs,payload=[ustr(val)])
File "D:\Python25\Lib\site-packages\xmpp\simplexml.py", line 32, in ustr
if type(r)type(u''): return unicode(r,ENCODING)tBody
self.setTagData('body',val)
File "D:\Python25\Lib\site-packages\xmpp\simplexml.py", line 243, in setTagDat
a
except: self.addChild(tag,attrs,payload=[ustr(val)])
File "D:\Python25\Lib\site-packages\xmpp\simplexml.py", line 32, in ustr
if type(r)type(u''): return unicode(r,ENCODING)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 20-21: invalid data

I was trying to convert the data being received by the sql connector to a unicode string and it was giving invalid data exceptions. The tables contained Unicode fields and I was retreiving unicode however the data coming from the connector were normal python ASCII Strings. Some other errors I saw while playing around with my code included:-


UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
52: ordinal not in range(128)

The solution was simple. Rather than using


self.conn = MySQLdb.connect(host=self.host,
user = self.user,
db = self.db)

I had to use


self.conn = MySQLdb.connect(host=self.host,
user = self.user,
db = self.db,
use_unicode = True, charset="utf8"
)

And viola things worked seamlessly eversince and I didnt need to do any conversion.

I didnt find the time to investigate more on what was actually happening in terms of how the data was actually being stored. I hope this fix was good enough for most, but for the more technically inclined, maybe I’ll investigate further and update this article in the future.