Longest Collatz sequence
The following iterative sequence is defined for the set of positive integers:
Which starting number, under one million, produces the longest chain?
n → n/2 (n is even)
n → 3n + 1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence:n → 3n + 1 (n is odd)
13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
It can be seen that this sequence (starting at 13 and finishing at 1)
contains 10 terms. Although it has not been proved yet (Collatz
Problem), it is thought that all starting numbers finish at 1.Which starting number, under one million, produces the longest chain?
NOTE: Once the chain starts the terms are allowed to go above one million?
Seeing the question one might think that the problem is simple. But the real problem here is that the number is 1 million which is very big and the solution may take long time if you are not optimizing the code.
I wrote two programs for this problem, first one took around 17 seconds while the second one took around 1.5 seconds. Yes, I took some time optimizing the code and you can see the result. Staggering 90% optimization.
Okay let me first share with you the code and I will tell you, where I needed improvement. The code is as follows:
#http://radiusofcircle.blogspot.com #time module for execution time import time #Time at the start of execution start = time.time() #collatz function straight forward def collatz(n): """ This function will take a number and it will return the size of the collatz sequence for input n""" counter = 1 while n>1: if n%2 == 0: n = n/2 counter += 1 else: n = 3*n+1 counter += 1 if n == 1: return counter #let the largest number be 0 largest_number = 0 #let the largest sequence size be 0 large_seq = 0 #for loop for 1 million for i in xrange(1000000,1,-1): #size of collatz for the iteration n = collatz(i) #if size if greater than previous #replace it with the new one if n > large_seq: largest_number = i large_seq = n #printing the largest number print largest_number #time at the end of execution end = time.time() #Printing the total time taken print end-start
As you can see this code is the direct implementation of the question. It took 16 seconds(approximately) to run this code. I was not satisfied with the run time so I searched on internet for the mistake I made and found some of the errors I made. Let me explain the mistakes I made in this code.
If we were to consider first five numbers 1,2,3,4,5 then the collatz sequence would be as follows:
1
2 → 1
3 → 10 → 5 → 16 → 8 → 4 → 2 →1
4 → 2 → 1
5 → 16 → 8 → 4 → 2 → 1
3 → 10 → 5 → 16 → 8 → 4 → 2 →1
4 → 2 → 1
5 → 16 → 8 → 4 → 2 → 1
As you can see, if we consider the collatz sequence of 3 then at some stage we have arrived at a point where we will have to repeat the sequence of 2.
Consider 4 we have again reached the sequence of 2.
Consider 5 we have reached the sequence of 4.
So if we were to stop repeating these sequences again and again then we will have a better and faster code. This is called as Dynamic programming and was suggested by Jesse Bender at Stack Overflow.
Some people also call it memorization or also partial caching
Program
The program is as follows:
See Python dictionary Comprehension used at line 10
One thing I want to explain in this code is the twist in the for loop from line 17.
Lets take for example the iteration of 3.
When it starts the value of
counter
is 0.
Value of
original_number
is 3.
As the condition is satisfied we can enter into the while loop.
Condition for first if is not satisfied and the value of n changes to 10.
So the progress till now is:
3 → 10
Value of
counter
= 1
In a similar way we will progress as follows:
3 → 10 → 5 → 16 → 8 → 4 → 2
At this stage the first if statement will get executed and the size of two is added to the chain. But if you were to convert the sizes to the chain then your chain will be as follows:
3 → 10 → 5 → 16 → 8 → 4 → 2 → 2 →1
But you might have a question on why we are getting the correct solution even if we are finding the length of the above solution.
Here is where you are assuming in a wrong way. if we were to go back to the first iteration then you can see that at the start the counter was 0 but not one even if the value of 3 was in the chain. And so eliminating 3 from the counter at first we have indirectly eliminated the extra 2 from our count.
And the rest of the code is well commented and I don't think it is necessary to explain.
You can download the code from Github Gist
Output
Summary
I have learned a new concept of dynamic programming by solving this problem. To be frank it took me around 1 day to find the solution to this problem. But I am really satisfied because the program is really optimized in time point of view and also in iterations point of view. In terms of performance wise I am really satisfied.
One more way in which we can make optimization is, if we consider the sequence of 3 the next number is 16 for which we can add the value to the dictionary and eliminate the iteration at 16 and so on. But I don't know how fast it will work because we are making new variables. Please try this for yourself if you want to.
If you didn't understand anything in this code then please do comment in the comment box below and I will glad to help you.
Please do comment if I have made any typo or if you want me to add any extra part to this post. Please do share your code with me, if you have a different solution or a better solution.
You can also contact me if you want to.
Thank you. Have a nice day😃.