Attacking a Hackerrank coding challenge in Python
Challenge number 4
Back to my series of coding challenge questions. I’ll be posting a few of these individually, as they are basically required practice if you are an entry-level data analyst or data scientist looking for your first position.
Think from the hiring manager’s perspective. They see you might have some projects and coding experience, but that doesn’t completely cut it for them. As one respectable senior data scientist once told me, who will the hiring person choose: the one who can code the right answers after scouring through stack overflow and google or the one who can code the answer just from the knowledge in his head? I think you know the answer to that.
These questions might seem mindless and unreasonably challenging at times, but your answers to them demonstrate the skills in your head. And once you start to get the right answers, you will feel good. How you approach these problems is also key to completing them.
Here is a question on a HackerRank that is a decent example of something you might see in a technical interview or coding challenge:
A non-empty array A consisting of N integers is given. Array A represents numbers on a tape.
Any integer P, such that 0 < P < N, splits this tape into two non-empty parts: A, A, …, A[P − 1] and A[P], A[P + 1], …, A[N − 1].
The difference between the two parts is the value of: |(A + A + … + A[P − 1]) − (A[P] + A[P + 1] + … + A[N − 1])|
In other words, it is the absolute difference between the sum of the first part and the sum of the second part.
For example, consider array A such that:
A = 3 A = 1 A = 2 A = 4 A = 3
We can split this tape in four places:
P = 1, difference = |3 − 10| = 7
P = 2, difference = |4 − 9| = 5
P = 3, difference = |6 − 7| = 1
P = 4, difference = |10 − 3| = 7
So this question is undoubtedly a bit confusing. Basically, we have an array A, consisting of N values. P is any value from 1 to N-1. In other words 0 > P > N. P can split the array at A[P] into two separate arrays. We are looking at the minimum difference between the sum of the two split arrays.
Let’s first code for our inputs and use 5 integers.
A = list(map(int, input().split()))#3 1 2 4 3
My first attempt looked like this:
first_half = 0
second_half = 0
diffs = 
for i in range(0, len(A)-1):
first_half += A[i]
second_half = sum(A) - A[i]
My approach is that I iterate through each value that P could be.
For each possible value of P, I can add each corresponding value in A to the first half, and the second_half with be the sum of A, minus that corresponding P value.
Then, I append the difference of the halves for each potential P to “diffs” and return the minimum difference.
Unfortunately, my answer was a bit of a failure. But why? My answer was failing in extreme/edge cases and also in OOP computation time.
So I did a bit of digging and came to a working solution:
mins = 
left_sum = 0
for i in range(len(A)-1):
left_sum += i
mins.append(abs(sum(A) - 2*left_sum))
This gave me 100 % correctness in my answer evaluation, but 0% on performance. So close. :(
I hate to say it, but this is why Codility is quite valuable. They force you to consider the complexity of your answer against edge cases that a professional data scientist might need to take into account when working with extremely large datasets. And this is not uncommon!
So what is giving us horrible performance? Usually the problems lie within the for loops. We have to be careful about what we calculate and how we decide to order those calculations.
One simple for loop could be one level of complexity notation, O(N) but when we are doing calculations within the for loop, complexity grows very quickly. Here, we are appending the absolute value and an addition calculation, and also a multiplication calculation to a list.
Let’s try and minimize the complexity of the calculation that is going on inside the for loop:
m = float('inf')
left_sum = 0
for i in A[:-1]:
left_sum += i
m = min(abs(sum(A) - 2*left_sum), m)
100% correct, now with 53% performance… why?!
So appending my absolute value calculation was a big chunk of complexity there. Here, in the for loop, we get our minimum and reassign itself, always keeping the minimum number it comes across.