Mark deGroat's Blog

Dark meGroat's Log

Turing Machine's and the Halting Problem

A turing machine is an abstract model of computation used to formally define what it means for a function to be computable. As you will see shortly the model of a turing machine is simple, but given any computer algorithm, a Turing machine can be created that is capable of simulating that alorithm’s logic.

It consists of:

An infinitely long tape which serves as the memory for the machine. The tape consists of squares which are usually blank at the start and can be written with symbols.

A read/write “head” which is positioned over one of the squares on the tape.
The head is capable of three basic operations:

  1. Reading the symbol on the tape
  2. Editing the symbol on the tape by writing a new symbol or erasing it.
  3. Moving the tape left or right by one square so that the turing machine can read or edit the neighbouring symbols.

That’s it!

A turing machine is given a set of instructions that tell it what actions to perform based on the given input on the tape.
The action that a Turing Machine is going to take is determined by the current state of the machine, the symbol in the cell currently being scanned by the read/write head and a table of transition rules. These transition rules can be thought of as “programs” that the Turing Machine then runs.

Here’s what some simple instructions look like, taken from a rasberry pi overview published by University of Cambridge:

images

We could write these instructions as such:

1
2
3
4
5
6
7
8
  Read Symbol  //it's blank
  Write 1
  Move Head Right
  Read Symbol //it's blank
  Write 1
  Move Head Right
  Write 0
  HALT

This example is pretty boring though, what if something was already written on the tape?

Here’s a program that creates the binary inverse of a binary number written on the tape. (The binary inverse of 110 would be 001).

1
2
3
4
5
6
7
8
9
10
11
12
START
  IF READ SYMBOL == 1
    WRITE 0
    MOVE HEAD LEFT
    GO TO START
  IF READ SYMBOL == 0
    WRITE 1
    MOVE HEAD LEFT
    GO TO START
  IF READ SYMBOL == BLANK
    HALT
  GO TO START

images

We’re writing some pretty sketchy pseudo code to represent the instructions given to a Turing Machine. Usually, instructions are represented using state transition diagrams. Here’s an example of a set of instructions that adds one to a unary representation of a number. It also has a good explanation of how a state transition diagram represents the different instructions: images

Now let’s see that program run in real time:

Why have I never seen one of these Turing Machine’s before?

Well, a Turing Machine is really just a hypothetical machine used as a model to prove certain limits on mechanical computation. But, it is possible to build one if you really wanted.

Now that we understand what a Turing Machine, let’s take a look at…

The Halting Problem

In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running or continue to run forever. https://en.wikipedia.org/wiki/Halting_problem

In other words, can you write a program that given some computer and some other program as parameters, could tell you whether the given program would run forever on that computer, or at some point would stop and give you an answer.

Hm… it’s starting to make sense why all these famous computer scientists look like dead heads.

Well, let’s try and prove this.

First, we’ll assume that we DO have some program that can achieve this feat. We can give it a program and a computer, and it will tell us whether that program will run forever or not. We’ll call it “Does it halt?”, represented by the weird diamond boxy thing below and trust that it returns either a true or false value.

If I gave this weird diamond box thingy this program in BASIC as a parameter:

1
2
10 PRINT "HELLO WORLD"
20 GO TO 10
|
v
|
v
FALSE

In other words, if we represent it as a function:

1
2
3
program_as_string = '10 PRINT "HELLO WORLD" 20 GO TO 10'
doesItHalt(program_as_string)
>false

Alright, this is where things get weird…

Now, imagine we made this little magical “Does It Halt?” program a piece of a bigger program? Let’s make it that if the “Does It Halt?” algorithm returns true, then we make our new program loop forever. But if it returns false, then we’ll halt (or end) our new program, which we will name “Program X”.

Program X

Now, here’s the kicker. What would happen if we fed Program X into itself? In other words, what if we ran Program X and gave it Program X as the parameter?

PROGRAM X
|
v
|
v
???????


So, now if the program doesn’t halt, then it halts. But if it DOES halt, then it doesn’t halt. What the hell?

What we have done is created a paradox, and in doing so proven that the answer to the question is undecidable, or it is neither true nor false. We can then conclude that our assumption that there exists a program which can answer the question of “Does It Halt?” must also be false.

We can then conclude that given a program and a computer, it is not possible to create a program that can tell you whether that program on that given machine will run forever, or eventually give you an answer.

Sources:

  • http://faculty.otterbein.edu/PSanderson/csc150/notes/chapter11.html
  • https://plato.stanford.edu/entries/turing-machine/
  • https://www.cl.cam.ac.uk/projects/raspberrypi/tutorials/turing-machine/one.html
  • http://danielschlegel.org/teaching/111/lecture4.html

Protecting Against SQL Injection in Rails

In March of 2008, Heartland Payment System was the victim of a data breach which lead to 134 million credit cards being exposed. The attackers used SQL injection to install spyware on Heartland’s data systems. The vulnerability the attackers exploited was nothing new to security experts, yet the company still fell victim to this attack.

Whether you’re storing pictures of cats or credit card information of millions of different users, eventually once your platform gets big enough someone is going to try and break it. The attacker might be a bored twelve year old who downloaded some scripts off of a deep web forum or a team of trained security experts in China attacking your site for political reasons. Whoever it may, and for whatever reason they’re trying to get in, it’s our job as web developers to make our application as resilient as possible to these attacks.

From: http://guides.rubyonrails.org/security.html#injection

So, what is a SQL injection anyways?

A SQL injection is when an end user manipulates database queries being performed on the backend by manipulating the parameters that are sent to the web server. Usually this is done by sending a string via a form that “tricks” the web server into firing SQL statements it did not originally intend to.

A common goal for a malicious end user would be to bypass authorization so they could get access without knowing a correct username or password.

Let’s pretend we’re checking the parameters sent by a web request in order to authenticate a user, which is handled by the create action of our SessionsController.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class SessionsController < ApplicationController

  def new
  end

  def create
    user = User.find_by("login = '#{params[:name]}' AND password = '#{params[:password]}'")
    if user
      redirect_to :super_secret_documents
    else
      render :login
    end
  end

  def destroy
  end
end

and then let’s pretend we have a simple login form that looks like this:

1
2
3
4
5
6
7
<form action="/create" method="POST">
  First name:<br>
  <input type="text" name="username"><br>
  Last name:<br>
  <input type="text" name="password"><br><br>
  <input type="submit" value="Submit">
</form>

Now, our end user is a big ol' jerk and decides to try and bypass our basic authentication procedure. By entering

‘ OR '1’=‘1

as their username and

‘ OR '2’>‘1

as their password. Now, when these malicious parameters are passed into our User.find_by method in the create action of our SessionsController, the resulting SQL query will be:

1
SELECT * FROM users WHERE login = '' OR '1'='1' AND password = '' OR '2'>'1' LIMIT 1

TODO put in kid giving thumbs up gif

This sql query will ALWAYS return true, as now the query think it needs to check if the login is correct, OR if 1 is equal to one, which will always equate to true. Similarly with the password parameter, the resulting SQL query is now checking whether the correct password was entered OR if 2 is greater than 1, which will also always equate to true.

Bypassing authentication is just one of the many things that can be accomplished with SQL injection. By being creative with what you try and enter into input fields, an end user can do a number of malicious things to your database.

Security Features of Rails

Luckily, Rails has a number of built in security measures to prevent situations like the one we just looked at from occurring. Rails has a built-in filter for special SQL characters, it escapes the ‘, “ , NULL character and line breaks. Whenever you use #Model.find(id) or #Model.find_by_something(something) this security feature is automatically applied. It is important to note though that using #where(”…“), #connection.execute() or Model.find_by_sql() does NOT apply this security measure, as Rails is assuming you are going to sanitize the input yourself. If this security feature was implemented, it would make these methods behave extremely strangely.

One way to easily sanitize this input is to pass an array or hash to the #where method instead of doing string interpolation.

1
2
3
4
5
  Model.where("login = ? AND password = ?", entered_user_name, entered_password).first

  OR

  Model.where(login: entered_user_name, password: entered_password).first

A Historical Example of Another Kind of Injection

“Phreaking is a slang term coined to describe the activity of a culture of people who study, experiment with, or explore telecommunication systems, such as equipment and systems connected to public telephone networks.” - https://en.wikipedia.org/wiki/Phreaking

Telecommunication systems used to use “in-band signaling” in order to send control messages to their systems via a phone line. They set certain frequencies to mean certain things, so if I sent a 2000hz tone it might mean “prepare call for routing” while a 2400hz tone might be interpreted by the system as “disconnect caller”. In band signaling means that these control tones were sent on the same channel that voice data was sent. Just like API’s have endpoints which fire different methods or functions on a webserver in order to generate a result, these control tones are the “endpoints” of the telecommunication system.

Let’s look at a frequency detection script I wrote using Python to visualize how this might work:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
__author__ = 'markdegroat'
import pyaudio
import wave
import numpy as np
import sys
import time
import timeit

chunkToAnalyze = 1024
if len(sys.argv) < 2:
        print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
        sys.exit(-1)
# instantiate PyAudio (1)
p = pyaudio.PyAudio()
wav = wave.open(sys.argv[1], 'rb')
sampleWidth = wav.getsampwidth()
sampleRate = wav.getframerate()
sampleChannels = wav.getnchannels()
#Use a windowing function to eliminate leakage that is not the dom freq
#windowing function is taken from the numpy library
window = np.blackman(chunkToAnalyze)
stream=p.open(format=p.get_format_from_width(wav.getsampwidth()),
        channels=sampleChannels,
         rate=sampleRate,
        output=True)
#begin to analyze chunk
data = wav.readframes(chunkToAnalyze)
#play stream and find the frequency of each chunk

#this step writes data out to the audio stream
# unpack the data and times by the hamming window
indata = np.array(wave.struct.unpack("%dh" % (len(data)/sampleWidth),\
                                     data))*window
# Take the fft and square each value
fftData = abs(np.fft.rfft(indata))**2
# find the maximum, aka find the bin with the most "hits"
which = fftData[1:].argmax() + 1
detectedFrequency = which*sampleRate/chunkToAnalyze


print "The freq is estimated to be about %f Hz."%(detectedFrequency)

if (detectedFrequency > 2550 && detectedFrequency < 2650){
  # END CALL
  #Here we would send a message to the telecommunication system
  #to hang up the line
}
if (detectedFrequency > 4550 && detectedFrequency < 4650){
  # END CALL
  #Here we would send a message to the telecommunication system
  #to prepare the line for a new call
}

https://en.wikipedia.org/wiki/John_Draper#Phreaking

http://mentalfloss.com/article/19484/true-crime-john-draper-original-whistle-blower

sources: http://www.dotnettricks.com/learn/webapi/what-is-web-api-and-why-to-use-it-

Database Normalization: A Surface Glance at 1NF ,2NF and 3NF

Database Normalization

What is database normalization?

Database Normalization is the process of organizing columns and tables in such a way to reduce data redundancy and improve data integrity. It is a way to simplify the design of a database so that everyone and anyone is able to set up a new database to achieve the most optimal structure that we have currently come up with.

(Side Note: Did you know that markdown doesn’t support tables? Hence my super old school ASCII art tables.)

A Non Normalized Database example

images

Why do we do it?

In our example above, you can see that for each row that represents a Sales Rep, we are also storing the Sales Office they work in, and the phone number for that particular office. Employees 1 and 3 both work in the same office, and in order to represent that with our current database design we had to store that information twice. When we duplicate information we are not only are wasting space in our database but we are also setting ourselves up to run into problems down the line when trying to perform certain actions on our database.

In order to understand why we normalize a database, let’s take a look at some of the things that can go wrong if we don’t.

Update Anomaly

What would happen if the Office Number for the New York office changes? With our current database design, we’d have to update the Office Number for every row where the Office Location is New York. It’d be much more efficient to only have to update the phone number one time, rather than have to update multiple columns in your database. images

Insert Anomaly

What would happen if we opened a new office in Chicago and wanted to store that new office location’s phone number? With our current setup, we cannot add a new office location unless we also record a new employee. Every new record needs a “Primary Key” and because we are using “Employee ID” as the primary key, we need a new Employee ID in order to save this new phone number.
images

Deletion Anomaly

Turns out our employee “Mary Martin” has just been found guilty of embezzling millions from our company and has been fired. Let’s go ahead and delete her from the database. images Someone should call our San Francisco office to let them know about Mary. We’ll just look up their number from our database and… Oh no! When we delete the row pertaining to “Mary Martin”, we also deleted our Office Number for the San Francisco office. A normalized database would store these Office Locations and their respective numbers in such a way that when we delete an employee that works at that location, we still have access to that location’s information, even if there are not currently any employees working there.

Searching Difficulties and Horizontal Growth

We can see in our non-normalized database, each employee has to have a new row created in order to store a new customer. This presents a couple problems. First, our current structure only allows each employee to have a max of 3 customers. If an employee had more than 3, we’d have to add an entire new column to our database for each new customer. images As the number of customers each employee has grows, our database has to grow horizontally in order to store these new customers. This is not what we want. Furthermore, how would we query our database for a specific customer using SQL? We’d have to set up a query like this:

  SELECT *
  FROM Employees
  WHERE Customer1 = 'Google' OR
  WHERE Customer2 = 'Google' OR
  WHERE Customer3 = 'Google' OR
  WHERE Customer4 = 'Google' OR
  WHERE Customer5 = 'Google'

But everytime we added a new row for a new customer, we’d also have to modify our search query in order to also search this new row. This isn’t efficient.

1st Normal Form (1NF)

The rules that your database design needs to follow in order to be considered in “First Normal Form” is:

  • The table stores information in rows and columns where the one or more columns uniquely identify each row.
  • Each column contains atomic values, and there are not repeating groups of columns.

Atomic values are values that cannot be further subdivided. “Google” is an atomic value because it refers to one customer, but a column called “Customers” which contained all the customer for an employee delimited by commas, “Google,Apple,Gerber” , would not be atomic because that one value in the table actually represents three different customers. This extends to another requirement which is that a table should not contain repeating groups of columns, such as the Customer1, Customer2, etc. columns we have in our table.

Let’s go ahead and redesign our database in such a way that we satisfy the rules of 1NF. images

We have now created a separate table, Customers, so that there are no longer repeating customer columns. This will make sure our database grows vertically as new customers are added, not horizontally as it did in our non normalized design.

What did this do for us? Well, now every time we want to add a new customer for an employee we simply add a new row to our customer table and use the Employee ID column to associate it to the correct employee. We no longer run into the insertion anomaly we encountered with our original database design when trying to add new customers to the databse.

The deletion anomaly is also no longer an issue for this new database design, at least in terms of Employees being related to customers. We can delete a customer from the database without having to delete the entire employee.

2nd Normal Form (2NF)

The rules that your database design needs to follow in order to be considered in “Second Normal Form” is:

  • The table is in 1NF.
  • All the non-key columns are dependent on the table’s primary key (which in our example is Employee ID).
If we take a look at our table in 1NF, we see that there are columns that are not dependent on the table’s primary key, Office Name and Office Number. Even though these two columns describe where the Employee is located and how to contact that location, the columns themselves don’t describe who the Employee is. We need to put this data into another table that is then referenced by the Employee table. We are also storing the Customer’s name for the table that joins an Employee to a Customer. We should separate this out into it’s own table so that if two employees have the same customer, the name would not be stored twice.

images We have now almost completely eliminated redundancy. Remember our update anomaly we ran into before? Now if we needed to update the office number for our San Francisco office, we only need to make one update to our table, while before we would have had to update it everywhere it appeared in the 1NF of the customer table.

The point of 2NF is that each table serves a single purpose. Before, our Employees table was not only storing information about an employee, but also information about the contact information for a particular location. By making our database design conform to 2NF, we separated this into two tables, one that stores information strictly about the employee, and another which stores information about an office. We then simply add a foreign key to the Employee table that references which row from the Office table they should correspond to.

3rd Normal Form (3NF)

The rules that your database design needs to follow in order to be considered in “Second Normal Form” is:

  • The table is in 2NF.
  • For a relation table R, every non-prime attribute of R is non-transitively dependent on every key of R.

Well, what does non-transitively dependent mean? The best definition I could find was “A column’s value relies upon another column through a second intermediate column.”

Let’s formalize this definition by using a 3 column table as our example. For three columns, A, B and the Primary Key. If the value of A relies on the Primary Key, and the value of B relies on A, then it is true that B relies on the Primary Key through A.

Let’s use this new table to explain this further: images Let PK = Office ID, A = Zip code and B = Office Location. A, the zip code, relies on the primary key because the zip code is a representation of where that Office is located. B, the office location, relies on A because the zip code points to the city in which the office is located. Because of this, A relies on the Primary Key through B.

In order for this table to conform to 3NF, we need to have the zip code point to the office location in a separate table. images

Now, every non-prime attribute of our table is non-transitively dependent on every key for our table.

Let’s look at another example from this site: http://www.1keydata.com/database-normalization/third-normal-form-3nf.php images Our original table, “TABLE_BOOK_DETAIL” has both a “Genre ID” and “Genre Type” column. Book ID, which we will refer to as the primary key, is used to determine the Genre ID. Then the Genre ID is used to determine the Genre Type column. Because of this, we can see that the Primary Key determines the Genre Type through Genre ID. This violates our second rule of 3NF, and thus we need to separate out Genre Type into it’s own table which is referenced by the Genre ID. images

Conclusions…

Work in progress…

To be added: BCNF 4NF 5NF Why Normalization is not always necessary and when to use it.

Eulive and Eulern

Optimizing Algorithm to Determine Primality of Number Using Ruby

A few months ago I stumbled upon an interesting problem on Projecteuler, which hosts “challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems”.

The problem was as follows:

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13. What is the 10,001st prime number?

Because Eulerproject.net is supposed to be used as a learning tool, I am going to follow the original author’s wishes and not publish direct solutions to questions online. But we can instead explore the core question being asked in this problem, how do you determine whether or not a number is prime?

Brute Force

Determining whether a number is prime or not appears to be relatively easy as n stays relatively small. Let’s take 5 for example. We as human beings determine if a number is prime by dividing n by every integer between itself and 1. For 5, we would say

5/5 is an integer.

5/4 is not an integer.

5/3 is not an integer.

5/2 is not an integer.

5/1 is an integer.

By knowing the definition of a prime number we can determine that because 5 is a prime number because it is only evenly divisible by itself and 1. This is a feasible method while n stays relatively small. But how long would it take you to tell me whether or not 472,882,027 is prime using this method? (spoiler:A really long time and yes it’s prime.)

Well luckily my computer is a billion times smarter than me (RIP jobs) so let’s go ahead and make a program in ruby that determines whether a number is prime in the same way a human brain would. We can take advantage of the fact my CPU can perform computations a lot faster than a human brain can.

1
2
3
4
5
6
7
8
9
10
11
12
13
def is_prime?(n)
  #these guys don't play by the rules
  if n == 0 || n == 1
    return true
  end
  array_of_factors = []
  for i in (1..n)
    if n % i == 0
      array_of_factors << i
    end
  end
  return array_of_factors.length == 2 #if there are only 2 factors, it must be prime
end

And let’s go ahead and add a way to track how long this thing takes to run. I’m using a latest generation quad-core I7, it should blaze through this no problem! We’re going to be using Ruby’s standard Benchmark library to time our program’s execution.

1
2
3
4
5
n = 11
time = Benchmark.realtime do
  puts "#{n} is #{is_prime?(11)}."
end
puts "Time elapsed #{time*1000} milliseconds."

Console:

1
2
11 is prime.
Time elapsed 0.03200001083314419 milliseconds.

Cool! This means our program can determine that 11 is a prime number in just .032 Milliseconds. Let’s try larger values of n to see how that affects the execution time.

When n = 113.

Console:

1
2
113 is prime.
Time elapsed 0.04499999340623617 milliseconds.

When n = 1000.

Console:

1
2
1000 is not prime.
Time elapsed 0.11299998732283711 milliseconds.

When n = 10000.

Console:

1
2
10000 is not prime.
Time elapsed 0.7440000190399587 milliseconds.

When n = 100000.

Console:

1
2
100000 is not prime.
Time elapsed 6.9629999925382435 milliseconds.

When n = 1000000.

Console:

1
2
1000000 is not prime.
Time elapsed 68.42699996195734 milliseconds.

Well this isn’t looking good…

It seems that as the size of our n grows, as does the execution time of our program. Determining whether 1,000,000 is prime takes about ten times longer than determining whether or not 100,000 is prime. We can see now that our human method for determining whether a number is prime or not is not going to be feasible when n gets to a certain size. Determining whether 1,000,300,017 was prime or not would take over 60 seconds to execute, which while slow is still a reasonable amount of time to wait for an answer. But if I were to want to check whether 1,000,000,030,117 was prime or not we’d have to wait over 16 hours for an answer. Fine, this is still technically a feasible amount of time to wait thanks to the invention of Netflix, but what happens when we want to see if

1,000,000,003,020,217 is prime or not?

Well, with an execution time of over 666 days, we’re going to need to figure out a faster way to check whether a number is prime or not if we’re trying to have an answer by the end of 2018.

Optimizing our algorithm:

We need to take advantage of certain mathematical properties of prime numbers in order to speed up the execution time of our program. If we look at our method for determining whether a number is prime or not, we can see we are doing some unnecessary calculations. A number n is prime if the only factor from 1 to the ceiling of sqrt(n) + 1 is 1, yet we check whether or not n is divisible by the ceiling of sqrt(n) + 1….n in our method #is_prime? . Let’s change our algorithm so it only checks for factors from 1 to sqrt(n).ceiling + 1 and see how that affects the running time.

We’ll write a new method called #improved_is_prime? :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def improved_is_prime?(n)
  #these guys don't play by the rules
  if n == 0 || n == 1
    return "prime"
  end
  array_of_factors = []
  for i in (1..Math.sqrt(n).ceil + 1)
    if n % i == 0
      array_of_factors << i
    end
  end
  if array_of_factors.length == 2
    return "prime"
else
    return "not prime"
  end
end

The only thing that changed from the original method we wrote is the range of our for loop. We now only check whether numbers 1 to sqrt(n).ceiling + 1 are factors, let’s take a look at how that affected the running time.

When n = 10.

Console:

1
2
3
4
10 is not prime.
Our original method took 0.037999998312443495 milliseconds.
10 is not prime.
Our improved method took 0.010000017937272787 milliseconds.

When n = 100.

Console:

1
2
3
4
100 is not prime.
Our original method took 0.04499999340623617 milliseconds.
100 is not prime.
Our improved method took 0.010999967344105244 milliseconds.

When n = 1000.

Console:

1
2
3
4
1000 is not prime.
Our original method took 0.10499998461455107 milliseconds.
1000 is not prime.
Our improved method took 0.011999974958598614 milliseconds.

When n = 10000.

Console:

1
2
3
4
10000 is not prime.
Our original method took 0.7489999989047647 milliseconds.
10000 is not prime.
Our improved method took 0.018000020645558834 milliseconds.

When n = 100000.

Console:

1
2
3
4
100000 is not prime.
Our original method took 7.978999987244606 milliseconds.
100000 is not prime.
Our improved method took 0.04600000102072954 milliseconds.

When n = 1000000.

Console:

1
2
3
4
1000000 is not prime.
Our original method took 65.731999988202 milliseconds.
1000000 is not prime.
Our improved method took 0.0819999841041863 milliseconds.

As you can see this slight modification to our method drastically improves the running time of our program, and the execution time is no longer directly related to the size of n. Before we had to perform a million calculations in order to determine whether or not 1,000,000 was prime or not, but our new method only needs to perform sqrt(1,000,000) + 1 or 1,001 calculations in order to determine the exact same thing.

images

Let’s go ahead and see if our improved method can answer our original question, is 472,882,027 a prime number?, in a reasonable amount of time.

1
2
3
4
5
6
7
8
9
10
n = 472882027
time = Benchmark.realtime do
  puts "#{n} is #{is_prime?(n)}."
end
puts "Our original method took #{time*1000} milliseconds."

time = Benchmark.realtime do
  puts "#{n} is #{improved_is_prime?(n)}."
end
puts "Our improved method took #{time*1000} milliseconds."

Console:

1
2
3
4
472882027 is prime.
Our original method took 31142.99600000959 milliseconds.
472882027 is prime.
Our improved method took 1.352999999653548 milliseconds.

As the size of n grows, the more our optimization improves the running time of the program. With this slight change to our original method we have made the process of determining whether or not 472,882,027 is prime over 23,000 times faster, and created a way to determine whether or not even larger values of n are prime in a reasonable amount of time.

Conclusions:

Exploring this topic reveals an important aspect of Computer Science. Brute force alone cannot solve all of the world’s problems, and it is up to us as computer scientists to come up with creative solutions in order to overcome these obstacles. Sadly, our previous solution of MOAR TRANSISTORS isn’t going to cut it unless a serious revolution occurs in the design of the modern day CPU. Plenty of problems like prime numbers still exist today, except their “improved” solutions have yet to be found (or may never be found depending on who you’re asking).

In future blog posts we are going to explore ways we can make this algorithm even faster, talk about some of these problems that have yet to be optimized, and dive into one of the greatest unsolved problems in mathematics, P vs. NP, so stay tuned!