Good points thx especially with limonene becoming irrelevant. Indeed they are doing something really good and hopefully they will make the public dataset available as XML or something that can be groped. Two problems are no standardized testing protocol and environmental factors affecting the numbers. Here is an algorithm to find similar strains/matches...
SQL Tables:
src
dst
Simplified SQL Fields:
src.SampleName as nvarchar
dst.THC as number(3,2)
dst.Limolene as number(3,2)
dst.CBD as number(3,2)
Example Records:
OhNoNotTheATF#1, 01.11%, 2.28%, 04.21%
NortLight#OOps , 21.12%, 2.21%, 04.21%
BlueDream #Two , 11.12%, 2.21%, 04.21%
FishStickMaint , 02.10%, 3.11%, 04.21%
I<3GeneraMotor, 12.11%, 3.21%, 05.21%
WheresMyCarKey , 10.11%, 3.21%, 05.21%
Task:
-We need to find all samples that have a similar "ratio" of attributes (THC, Limolene, CBD) within a threshold window of three hardcoded "percentage points" variance.
-OhNoNotTheATF#1 should match FishStickMaint because all the attributes are "similar" with less than 3 percentage points differing between OhNoNotTheATF and FishStickMaint.
-BlueDream #Two and I<3GeneraMotor and WheresMyCarKey should match because all the attributes are similar with less than three percentage points difference.
Here is resulting set:
Group1
OhNoNotTheATF#1, 01.11%, 2.28%, 04.21%
FishStickMaint , 21.12%, 2.21%, 04.21%
Group2
BlueDream #Two, 11.12%, 2.21%, 04.21%
I<3GeneraMotor, 12.11%, 3.21%, 05.21%
WheresMyCarKey, 10.11%, 3.21%, 05.21%
!In reality we have 100+ attributes, but to keep example data simple only illustrated three attributes (THC, Limolene, CBD)!
Algorithm (Oracle PSQL):
change the number 3 to 0.3 or whatever for less variance
SELECT src.SampleName, dst.*
FROM dst src
INNER JOIN SampleName dst on ABS(src.THC - dst.THC)< 3 AND ABS(src.Limolene - dst.Limolene)< 3 AND ABS(src.CBD - dst.CBD)< 3
ORDER BY 1 asc;
More Info:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions002.htm
SQL Tables:
src
dst
Simplified SQL Fields:
src.SampleName as nvarchar
dst.THC as number(3,2)
dst.Limolene as number(3,2)
dst.CBD as number(3,2)
Example Records:
OhNoNotTheATF#1, 01.11%, 2.28%, 04.21%
NortLight#OOps , 21.12%, 2.21%, 04.21%
BlueDream #Two , 11.12%, 2.21%, 04.21%
FishStickMaint , 02.10%, 3.11%, 04.21%
I<3GeneraMotor, 12.11%, 3.21%, 05.21%
WheresMyCarKey , 10.11%, 3.21%, 05.21%
Task:
-We need to find all samples that have a similar "ratio" of attributes (THC, Limolene, CBD) within a threshold window of three hardcoded "percentage points" variance.
-OhNoNotTheATF#1 should match FishStickMaint because all the attributes are "similar" with less than 3 percentage points differing between OhNoNotTheATF and FishStickMaint.
-BlueDream #Two and I<3GeneraMotor and WheresMyCarKey should match because all the attributes are similar with less than three percentage points difference.
Here is resulting set:
Group1
OhNoNotTheATF#1, 01.11%, 2.28%, 04.21%
FishStickMaint , 21.12%, 2.21%, 04.21%
Group2
BlueDream #Two, 11.12%, 2.21%, 04.21%
I<3GeneraMotor, 12.11%, 3.21%, 05.21%
WheresMyCarKey, 10.11%, 3.21%, 05.21%
!In reality we have 100+ attributes, but to keep example data simple only illustrated three attributes (THC, Limolene, CBD)!
Algorithm (Oracle PSQL):
change the number 3 to 0.3 or whatever for less variance
SELECT src.SampleName, dst.*
FROM dst src
INNER JOIN SampleName dst on ABS(src.THC - dst.THC)< 3 AND ABS(src.Limolene - dst.Limolene)< 3 AND ABS(src.CBD - dst.CBD)< 3
ORDER BY 1 asc;
More Info:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions002.htm
Last edited: