I have just read about the Sieve of Eratosthenese from this Scientific American article from Facebook and was inspired to give it a try to see how it works, especially because the math and programming seemed simple enough. I worked with a dataset of 1-1000. I also tried up to 100,000. My browser to freaks out with a dataset larger than that.
I have also tried to setup my process so there would be no numbers that would be worked on more than once for each round of multiple removal. I am calling this the Square Prime Reduction Method (SPRM).
In order to do so, I would look for a pattern that would identify the remaining numbers. The farther along I worked the harder it became. Below I will list my notes, the pattern used for cumulative reduction, and the amount of dataset reduction, and the dataset size required for pattern recognition. This is the result of about a day of work.
See my notes here and the results of my scripting work here.
Square Prime Reduction Method (SPRM)
Using this cumulative multiple removal method it seems that the beginning for each reduction round is the square of the next prime number (i.e. for 2 it is 4; for 3 it is 9; for 5 it is 25; etc). For each reduction round I:
- start at the square of the next prime
- run a clean script routine as if I were removing all of the multiples which showed zeros for the multiples already removed
- find the pattern for its remaining multiples
- code to remove the remaining multiples
With each successfully processed prime, we are able to further reduce the pool of unprocessed numbers. To reveal all primes under 1000 I would have to follow this process up to the square of the prime of 31 which is 961.
Prime Multiple Reduction (PMR)
I started with a dataset of 1-1000 to see if I could get it to work. Starting with 2 I processed the removal or the multiples of each prime number as I encountered them and then I moved up the list. It also seemed to scale fine for 10,000 and 100,000 and my observations and tenuous rules seem to work as the dataset scales.
Multiples of 2
First, in any given number set we can get rid of half of the numbers because they are even, meaning evenly divisible by 2, so that speeds up things considerably.
Pattern Complexity: Obvious
Pattern: Ummm…well… Every other number! =)
Dataset Reduction: This resulted in a reducing the possible number pool by 49.9% (499).
Multiples of 3
Second, we can get rid of every third number because they will evenly divisible by 3, although I will have to compensate for the missing even numbers.
Pattern Complexity: Obvious – 1 extra step
Pattern: In writing the code for this I noticed the following pattern after removing the even numbers: start with 3 and then add 6 and remove that
Dataset Reduction: This resulted in a reducing the possible number pool by about 16.6% (166).
Multiples of 5
Then, removing multiples of 5 becomes slightly more tricky since we have removed numbers divisible by prime numbers 2 and 3 creating quite a few holes in its multiple list.
Pattern Complexity: Easy – 2 extra steps
Pattern: The pattern for multiples of 5: start at 5, increment by 30, remove that number and then remove the number @ -10.
Dataset Reduction: This resulted in a reducing the possible number pool by 6.6% (66). Not a whole lot, but we are getting closer.
Multiples of 7
Removing Multiples of 7 was much more difficult since we have removed a broad swath of numbers ~ 72.7% of the number pool so there were a lot of holes. Stay on Target!
Pattern Complexity: Moderate – 8 extra steps
Pattern: To find the pattern for multiples of 7 I copied the result of a straight removal by 7 and put it into a text file and looked for how the number of zeros (already removed multiples) stacked up between the number of those numbers remaining which were divisible by 7. There was a visual pattern and from there I had to figure it out. I found a repeating pattern of previously removed multiples: 3/1/3/1/3/5/1/5. Turning that into a computational model was fun!
My second idea, which was used, was the opposite sort of implementation than what I used for the multiples of 5. With 5 I went up and then back. With 7 seven I programmatically worked my way forward to the next iteration.
Dataset Reduction: This resulted in a reducing the possible number pool by 3.2% (32).
Multiples of 11
Because I needed to jump the max number out from 1,000 to 10,000 in order to try to see a pattern, attempting to remove multiples of 11 was too difficult to fully pursue.
Pattern Complexity: Difficult – with the numbers and observations below I expect the pattern would be near 32 and 512 extra steps.
Pattern: I was able to see a vague pattern in about 5,000 items, but was not going to pursue that due to lack of time and quality pattern recognition and visual analysis tools.
Dataset Reduction: ?? – given the numbers which I will display below I expect a reduction of about 1.98%
Larger Dataset Needed for Pattern Recognition
To compensate for the plethora of removed numbers as you go further and further through the primes it seems like we need an increasingly larger data set for analysis in order to see the pattern.
- for 3: I do not remember exactly, but I expect that the pattern was obvious near 20 items.
- for 5: I do not remember exactly, but I expect that the pattern was obvious near 75 items.
- for 7: about 500 to recognize the pattern
- for 11: I bumped it up to 10,000 and a pattern of sort was seen near 5,000
Possible rule: To find the pattern for each successive prime we will need a dataset 10 times larger than the previous one. With this, I am not really taking into account the first two since the pattern is super simple to find because there is minimal processing and with a number reduction which is very easy to compensate for.
Or perhaps the dataset has to do with the number of multiples removed and the remaining number of numbers remaining from the prime’s square value.
That is supremely annoying. I would need better tools to play with larger numbers. =O
Increasing Complexity in Pattern
As each square is processed it will become more and more difficult to process the pattern to prevent processing already removed numbers. As mentioned above the dataset requirements grow in order for us to see the pattern, but also so does the work required to process the pattern to remove subsequent multiples. This is also biased by my limitations, biases, and inexperience with such work. I definitely need to get through processing the pattern for 7 before I can have a clearer picture, but, since I am stopping here, I will throw out the following tenuous rule:
Possible Rule: Based on my very limited work the complexity increase with each prime processed is one of these 3 ideas: the square of the previous number of complexity steps * 2; the cube of the previous number of complexity steps; or the number of previous complexity steps * 4.
This was more evident from processing 5 and 7:
- 2 – 0 extra programmatic steps
- 3 – 1 extra programmatic step
- 5 – 2 extra programmatic steps
- 7 – 8 extra programmatic steps
- 11 – I expect near 32, 128, or 512 extra programmatic steps
Keep in mind, this is probably not close to wholly accurate given my limited data and processing (up to 7), but it might give you a place to start with in understanding the complexity of finding the multiples in each successive prime generation using this technique. I really need to find out how many extra steps are required for 11 to have a better idea about my observations.
Diminishing Returns with each Processing Generation
When we start with 2 we remove half of the numbers and as we process each generation we reduce fewer and fewer numbers which will seem to result in diminishing returns from processing each generation.
- 2 – 49.9%
- 3 – 16.6%
- 5 – 6.6%
- 7 – 3.3%
- 11 – I expect this will be near 1.98%
I expect there is a mathematical relationship to be found here, but I am not sure what yet. I also expect that for any given dataset that proportionally numbers will be found. At some point, depending on the size of your dataset, may not be worth the extra work to try to find a pattern and just working on a brute force technique to process.
Possible Rule: Law of Diminishing Returns – the % of multiples removed as compared to the previous generation is roughly equal to the delta removed by the previous generation + 10%. This average increase may also be increase be a few % points too assuming that the delta from 2-3-5 has any real bearing on further results.
Possible Rule: It appears that once you get done removing a prime’s multiples from a dataset (and all previous primes’ multiples) then all numbers up until the square of the next prime will be prime numbers.
Working through the primes through 7 gives us to a total reduction of 76.5 of the dataset up to 1000. =O My primes are correct up the the next square which is for 11 @ 121. I would need to process that and then the prime numbers list would be correct until the next square – 13 @ 169. I am expecting that this will be the patttern if we were to continue. Since I need a really large dataset to see the pattern for 11 I think I will stop now. Hopefully, someone will find this useful. =) I am sure there are much more complex mathematical relations here than I able to discern with my limited knowledge (not a mathematician), time, tools, and experience.
Here is the collection of the potential observed rules from above so we can have them all in one place:
- Increasing Dataset Size: To find the pattern for each successive prime we will need a dataset 10 times larger than the previous one.
- Increasing Pattern Complexity: The increase with each prime processed is the one of these 3 ideas: the square of the previous number of complexity steps * 2; the cube of the previous number of complexity steps; or number of complexity steps * 4.
- Law of Diminishing Returns: the % of multiples removed as compared to the previous generation is roughly equal to the delta removed by the previous generation + 10%
- Square Prime Rule: Once you get done removing a prime’s multiples from a dataset (and all previous prime’s multiples) then all numbers up until the square of the next prime will be prime numbers.
The info for primes 11, 13, 17 are predicted based on my limited observational projections here:
|prime||# of non-repeating numbers reduced||% of total numbers reduced||% delta in reduction from previous gen||dataset required to see pattern||extra programmatic steps|
|prime||# of non-repeating
|% of total numbers
|% delta in reduction
from previous gen
|dataset required to see pattern||extra programmatic
|11||19.8||1.98%||60.00%||5000||512 or 128 or 32|
|13||13.86||1.39%||70.00%||50000||134,217,728 or 524,288.00 or 128|
|17||11.088||1.11%||80.00%||500000||2.41785E+24 or 36,028,797,018,964,000 or 512|