scala-glm

scala-glm - Quickstart Guide

PCA

This library contains code for principal components analysis based on a thin SVD of the centred data matrix. This is more numerically stable than a construction from the spectral decomposition of the covariance matrix. It is analogous to the R function prcomp rather than the R function princomp. First create some synthetic data.

import breeze.linalg.*
import breeze.numerics.*
import breeze.stats.distributions.*
import breeze.stats.distributions.Rand.VariableSeed.randBasis

val X = DenseMatrix.tabulate(100, 3)((i, j) => 
	Gaussian(j, j+1).sample())

Now we can do PCA.

import scalaglm.Pca
val pca = Pca(X, List("V1", "V2", "V3"))
pca.sdev
// res0: DenseVector[Double] = DenseVector(3.0998095614527834, 1.7701697229565552, 1.0666162387707565)
pca.loadings
// res1: DenseMatrix[Double] = 0.0036729504332969265  0.032876332326816665   -0.999452678323417     
// 0.03580463173570253    -0.9988228901391757    -0.032724035208013966  
// -0.9993520589769079    -0.035664841325335095  -0.00484575193890582   
pca.scores
// res2: DenseMatrix[Double] = -1.394158595267834    1.0216260370363757     -0.4268247733930781    
// 2.2377327694724145    -1.8225392896374022    1.2088712200616831     
// -0.9366800149124246   0.6719039094420931     -0.9143906089964495    
// 1.4805271191858658    1.2873004790149345     0.4508707093984419     
// -3.2490389126046137   -1.9374031160406242    0.7762515717284948     
// 2.2418256470898954    -0.21281283475638613   0.0884558623095737     
// 2.0433008483045945    1.9751555580415563     0.8719419482928384     
// -1.0615144843583622   -4.06045126525286      -1.0675701980049457    
// 1.3987918355626507    0.7415629769753357     -1.3292744827519665    
// -0.5700141751052159   2.3478467358022996     -1.592664798091452     
// 1.2626051529390434    -1.5884420358884574    -2.474445518059979     
// 0.3547741016230765    -0.6152754842521497    -0.3662615886016668    
// 0.46937668186709824   0.08751435016010929    1.5050276544956893     
// -0.9148239880487022   2.021083516899193      -1.5575963732531108    
// 4.120022309539249     1.3289332968476546     -0.4794814489682083    
// 0.21068291528858804   -0.3989358474309738    0.7241476464476504     
// -4.515632007957123    1.7117734126905892     0.7795731480526701     
// -1.8995041580203795   -3.6347929268228616    -0.090287149987935     
// -0.18424340842366593  1.18219237671139       -0.6537333163272695    
// -0.8601347962834204   -0.8416047866510425    -2.203728680313584     
// -2.570707842957018    1.333531781726853      0.3040227631127551     
// 0.445422044657819     1.42870840792089       1.2299295458493613     
// 0.31391165566153956   0.5524906616376477     -0.37292509711070276   
// -5.364293004275806    -1.0244351579189876    -1.3312048981662126    
// -3.814590140832471    -1.6494146477859761    1.068398266816663      
// 2.3107026881049615    -1.3294541304865202    -0.6237597872420391    
// -2.770956372220559    -0.19294086854496006   0.8796740918574693     
// -3.1234445913762454   -0.2704425542156068    -0.9805948697304888    
// -6.2523917179887585   -1.7664387278403324    -1.6165869559794115    
// -0.8968886399068178   0.13531843500077656    -0.913675247577599     
// 3.4764910260368036    1.4144602642348516     1.3281260243613429     
// 3.378689072210409     1.3640936524619756     0.5755469999324306     
// 7.636551221025751     -0.5094281309388236    -0.28490931836938593   
// -0.09368823267638687  -0.7319791770348332    0.4364142331548334     
// -0.2965491772533513   0.8785235796990836     0.6838899088009555     
// -0.5360183233868241   -2.0206281758047844    0.5084932924990656     
// -2.0202767503559103   2.328093190197525      2.7531507238101307     
// 0.33582200784236443   -0.39769176349175994   -0.5217517769841028    
// 1.0448855779317112    -1.4272045011310617    0.9364475650000684     
// -1.4801461572117416   -0.08288179447765764   -0.004048338823481208  
// 5.125900744048375     1.2368829404513098     -0.8649696111126226    
// 1.2137010650572853    1.4876098859220899     0.14776259032805414    
// 3.45825733561804      -3.1867956318079873    0.16133642928175662    
// -3.0145090240413306   1.126239788620701      0.9670283642077558     
// 2.1000549748030273    -1.897350652497234     0.8314547863796222     
// -4.514818371047823    3.578708743437634      0.5860633366533579     
// 3.001700715621771     1.059936468267261      0.5219958383555291     
// 2.8508807243737326    -0.2173842804319599    -1.1288313573579734    
// 2.6176004281690997    -0.90243302013686      -0.7671497755140982    
// ...
pca.plots
// res3: Figure = breeze.plot.Figure@578fd16
pca.summary
// Standard deviations:
// V1	V2	V3
//  3.100	 1.770	 1.067
// Cumulative proportion of variance explained:
// V1	V2	V3
//  0.692	 0.918	 1.000
// Loadings:
// PC01	PC02	PC03	
//  0.004	 0.033	-0.999	V1
//  0.036	-0.999	-0.033	V2
// -0.999	-0.036	-0.005	V3

The final line prints a readable summary of the PCA to the console. plots produces some diagnostic plots, including a “scree plot”.

PCA Plots

Note that there is also a utility function pairs for producing a “scatterplot matrix”:

import scalaglm.Utils.pairs
pairs(X, List("V1", "V2", "V3"))
// res6: Figure = breeze.plot.Figure@38f1023d

Pairs plots

Linear regression

This code computes regression coefficients and associated diagnostics via the QR decomposition of the covariate matrix. The diagnostics are analogous to those produced by the R function lm. We start by creating a synthetic response variable.

val y = DenseVector.tabulate(100)(i => 
	Gaussian(2.0 + 1.5*X(i,0) + 0.5*X(i,1), 3.0).sample())

So we can now do linear regression and generate all of the usual diagnostics.

import scalaglm.Lm
val lm = Lm(y,X,List("V1", "V2", "V3"))
lm.coefficients
// res8: DenseVector[Double] = DenseVector(1.1509926285799386, 1.3805593781823824, 0.5367433610974435, 0.11470313757973068)
lm.se
// res9: DenseVector[Double] = DenseVector(0.41337757976234585, 0.29063792501861696, 0.17527417249595054, 0.10018102724718353)
lm.fitted
// res10: DenseVector[Double] = DenseVector(2.1716898820169566, 0.9643402327493423, 2.984988810252156, 0.5535790242277332, 2.1216219393037385, 1.7337096940595398, -0.4262484545066114, 5.552038688628419, 3.3195096284105876, 3.0707587758711026, 6.085595554425664, 2.7389911643683726, -0.23503293597399483, 3.2145732105753257, 1.5950876017955498, 1.120651119139777, 0.4257367188487635, 4.050953408340172, 2.300136224969994, 5.529376678687506, 1.1019747096583101, -0.5119451503674066, 2.174169914183654, 4.807028415841929, 1.621814143220369, 3.2756412011221547, 1.0706694919627826, 3.7413769457242276, 5.653356122499356, 3.245904407027335, -0.9159935277282627, 0.16981233916614458, 1.9150549032172528, 1.7151722448158513, 0.590646124056689, 2.292013845186252, -2.863569696101713, 2.85040314369131, 1.2573123797662729, 2.1349753967992564, 2.0886473732615554, 0.9022902172056919, 2.9935059199710623, 0.3178074763743847, 1.5413953234363724, -0.22761181289971472, 0.4292314382004901, 3.3826245787480413, 3.237067985551164, 2.0378611501724704, 2.850096320623325, -0.23191942649109096, 4.292601868894431, 2.5698877206410304, 2.1692589171874186, 4.803832645519485, 1.3324809467335739, 3.771128736772944, 0.002452262163630311, 3.883539570822381, 1.5230480986649801, -0.30828699610078314, 1.3730955963031175, -1.0273364897236905, -0.015007381827157162, 2.7001746788461722, 3.6921485701764265, 4.426743041450812, 1.8191269004289696, -0.602004295900731, -0.8002156957846001, 2.798318964232702, 0.8990847086176876, 2.9843411827995165, -1.4666067551704314, 3.3656388958958177, 2.613383461027096, 3.4000832689954974, 1.0133943274675508, -1.4103039338715762, 1.5788161451933869, 0.6637985333131495, 1.1604831787142822, 1.2313263533561225, -0.8856188218926145, 2.835986053608344, 5.267019666477463, 2.588412965825187, 0.7568902558106058, 3.2534765144592193, 3.0043635477342834, 1.967546695717985, 2.5714812460067034, 3.434398206781984, 1.451893452676097, 2.6987501166259436, 4.024045455599482, 3.021470137203842, 0.7984572379044801, 0.7129602424914279)
lm.residuals
// res11: DenseVector[Double] = DenseVector(3.7775346324246954, 0.009739608201971617, 2.3364441553630013, -0.15843034028448844, 4.615526329434065, 1.0538795547621058, -1.403778138701498, 2.676954062683115, -3.5490029895923527, -0.3348272864587032, 5.722130293119109, -2.0265279317531055, 4.4193762369307885, -1.8910854495571443, -6.899356017783015, -1.125251130762212, -3.8085739624642265, -1.549955056584675, 2.9178231779742, -5.909433336646672, -2.2722084532878783, 0.056714412054489016, 3.1364233707264546, 1.0586194631732235, -1.2655155671275187, -0.873915304150775, 1.244132335088552, 4.4822532559000505, -1.5536618915070823, -2.1564388351395065, -1.6521864208859056, 0.8023248157275525, 0.41674886184647675, -3.080414940041366, 0.8555008325363277, 0.7721789596562485, 2.9091902138339, -1.6208864550061133, 6.604140622513894, 3.2423017355917585, -0.446973659636718, 0.725110050144997, -3.562171269821119, -0.07893329367749319, -6.668274174804595, 1.09547823545997, -3.482006835151823, -1.1913204292505277, 5.863911734906857E-4, -0.09447706447037763, -4.092437146131757, 0.6171336366245932, -5.1340758200745595, -3.8169492535544745, 1.7313595932628498, -1.0856508550057224, 4.313578633634023, -6.064537222325039, 1.8041187675070052, 3.0476839836179073, 3.0783243561979754, -2.018525879922457, 0.4764021991433911, 2.1034613705601473, -0.63606388743859, 0.6671682512321366, -0.7643669913920697, 4.479538875792371, 3.0043760557362758, 4.231407808274642, -5.520885953559005, 0.9642244872657493, -5.763286244411921, -0.6155376425954526, -3.7695531757010494, 1.6069060890931732, 4.373097123400111, -3.8674740677741135, -0.8441379241174642, 2.4572477229767618, -1.6420921219512308, -2.74407735309696, 0.8505029481238742, -0.4030873494134275, -3.806718034432282, 0.7737809209359146, 0.37822520656520986, 2.100451539844813, 3.2876883860846364, -3.937324175005609, -0.6714681101017135, 1.3197803222240458, 1.791531859497633, -1.1609959750074696, 4.89346024570986, 4.238328834619306, 5.836368480073772, -3.002683556686333, -2.2124213539095576, 2.8726970238365324)
lm.studentised
// res12: DenseVector[Double] = DenseVector(1.234829713707691, 0.0032198638150146243, 0.7648356314766611, -0.05185295490768288, 1.5255517044733597, 0.3442331002335563, -0.4628237950699016, 0.9018017219474239, -1.1675685793342325, -0.11137134602307189, 1.9273584364180916, -0.6609421499647522, 1.4545450799264736, -0.627354453321821, -2.2769047851642137, -0.36749266351776083, -1.2639242069437777, -0.5171390914476713, 0.9544368214936073, -1.9713827447910461, -0.7452155108654478, 0.018664936479007917, 1.0228144820419516, 0.3537524650099577, -0.4195213208526894, -0.28680350820857803, 0.40836441591192585, 1.4742224690576642, -0.5262148363434304, -0.7053768747947756, -0.5478797643343057, 0.2641463441741174, 0.1402722876311207, -1.0051120873516288, 0.27961116291176635, 0.2535447576381101, 0.9933316896784545, -0.528778516586975, 2.168179090802801, 1.057343923701877, -0.1485614193434034, 0.23725304035824987, -1.1879357406603213, -0.026000390619226214, -2.197002457641283, 0.3693019494959984, -1.1431712497537347, -0.3920258836902488, 1.9247165704011364E-4, -0.030977901622442915, -1.3460732568308265, 0.2031263452700876, -1.6946052206998135, -1.2444221585462003, 0.5740893819471526, -0.35953589371213635, 1.4086936200913445, -1.9897453687538669, 0.5916108953059012, 1.003174166499884, 1.0110642760986093, -0.6691198735704144, 0.165216389819568, 0.7027901993464688, -0.21002606290768933, 0.22397095012601545, -0.251563055308219, 1.5607367946333188, 0.9791885463220625, 1.3943549413200504, -1.8241946777028535, 0.3245643980021667, -1.9229440532685282, -0.2015403147180517, -1.2883841212923952, 0.5329287979970578, 1.427189281191174, -1.2667084069505827, -0.27623540910688454, 0.8215024397590897, -0.5387368666286141, -0.9034022261194692, 0.2778109234979469, -0.13387645763239675, -1.2629585926744953, 0.25423892519416924, 0.1262653705103085, 0.7207394942611322, 1.0772639194996498, -1.3158800907662758, -0.23051807173427136, 0.43112217256484436, 0.5861446004533151, -0.38633806555363837, 1.6022313520417917, 1.4372370351745352, 1.9366465759909843, -0.9855194370704972, -0.736154497294662, 0.9465374024135108)
val pred = lm.predict()
// pred: PredictLm = PredictLm(
//   mod = Lm(
//     y = DenseVector(5.949224514441652, 0.974079840951314, 5.321432965615157, 0.39514868394324476, 6.737148268737804, 2.7875892488216456, -1.8300265932081095, 8.228992751311534, -0.2294933611817651, 2.7359314894123994, 11.807725847544774, 0.7124632326152671, 4.184343300956794, 1.3234877610181814, -5.304268415987465, -0.004600011622434952, -3.382837243615463, 2.5009983517554972, 5.217959402944194, -0.380056657959166, -1.1702337436295682, -0.4552307383129176, 5.310593284910109, 5.865647879015152, 0.3562985760928503, 2.4017258969713797, 2.3148018270513346, 8.223630201624278, 4.099694230992274, 1.0894655718878283, -2.5681799486141683, 0.9721371548936971, 2.3318037650637295, -1.3652426952255148, 1.4461469565930167, 3.0641928048425005, 0.045620517732186894, 1.2295166886851967, 7.861453002280167, 5.377277132391015, 1.6416737136248374, 1.6274002673506889, -0.5686653498500567, 0.23887418269689153, -5.126878851368223, 0.8678664225602551, -3.0527753969513327, 2.1913041494975136, 3.2376543767246546, 1.9433840857020928, -1.2423408255084318, 0.38521421013350227, -0.8414739511801281, -1.247061532913444, 3.9006185104502684, 3.7181817905137624, 5.646059580367597, -2.293408485552095, 1.8065710296706354, 6.9312235544402885, 4.601372454862956, -2.32681287602324, 1.8494977954465086, 1.0761248808364565, -0.6510712692657471, 3.367342930078309, 2.927781578784357, 8.906281917243183, 4.823502956165245, 3.629403512373911, -6.321101649343605, 3.7625434514984515, -4.864201535794233, 2.368803540204064, -5.236159930871481, 4.972544984988991, 6.986480584427207, -0.4673907987786161, 0.16925640335008652, 1.0469437891051854, -0.06327597675784391, -2.0802788197838105, 2.0109861268381564, 0.828239003942695, -4.692336856324896, 3.6097669745442587, 5.645244873042673, 4.68886450567, 4.044578641895242, -0.6838476605463897, 2.33289543763257, 3.287327017942031, 4.363013105504336, 2.273402231774514, 6.345353698385956, 6.937078951245249, 9.860413935673254, 0.018786580517509144, -1.4139641160050775, 3.5856572663279604),
//     Xmat = 0.49574380875458324    -0.11786816547759194   3.483413381888676      
// -1.2192229003369857    2.7994612256835243     -0.05261420339784717   
// 0.973225504268477      0.2637772744188578     3.0410666279544927     
// -0.3621783128880418    -0.30902455396961304   0.5968619628329994     
// -0.8107689758712942    2.731895065853119      5.436795341864435      
// -0.046483835374395654  1.228440651276292      -0.10868581277033584   
// -0.7583378981830228    -0.9896991959249324    0.007880228879485474   
// 0.970280262537055      4.9911049492301816     3.3353411787176426     
// 1.3987505100855768     0.29139758728827714    0.7066340702448444     
// 1.7075740579724783     -1.3748685793052706    2.618152879481108      
// 2.466192542401559      2.651258431535629      0.931380985765726      
// 0.38782219621371056    1.5777445622403103     1.7937002490905336     
// -1.458916767968931     0.8186491567783283     1.6450392331522399     
// 1.66051558131452       -1.0619833653496875    2.974223301467104      
// 0.5787281129333423     -0.22565733535692534   -2.03789956237296      
// -0.6954070202146031    1.3208178588769597     1.924698501349781      
// -0.6987693276053462    -0.958444575920893     6.572404362190651      
// 0.004448293173311423   4.503963100892397      4.152871176169467      
// 0.732250950578053      -0.2274995700864917    2.269655118213705      
// 2.2123804248323804     1.820437422306333      3.024797866062426      
// -0.22877080920425052   -0.49544895588908244   4.64453472029468       
// ...
pred.fitted
// res13: DenseVector[Double] = DenseVector(2.1716898820169512, 0.9643402327493412, 2.984988810252155, 0.5535790242277324, 2.1216219393037377, 1.7337096940595396, -0.42624845450661214, 5.552038688628418, 3.3195096284105876, 3.070758775871102, 6.085595554425664, 2.738991164368372, -0.23503293597399577, 3.2145732105753257, 1.5950876017955493, 1.120651119139776, 0.42573671884876285, 4.050953408340172, 2.3001362249699935, 5.529376678687506, 1.1019747096583095, -0.511945150367408, 2.1741699141836532, 4.807028415841928, 1.6218141432203679, 3.2756412011221547, 1.0706694919627817, 3.7413769457242276, 5.653356122499356, 3.245904407027334, -0.9159935277282635, 0.16981233916614388, 1.9150549032172517, 1.7151722448158506, 0.5906461240566883, 2.292013845186251, -2.8635696961017145, 2.8504031436913095, 1.257312379766272, 2.134975396799256, 2.0886473732615554, 0.9022902172056912, 2.993505919971062, 0.3178074763743836, 1.5413953234363718, -0.22761181289971533, 0.4292314382004892, 3.3826245787480405, 3.237067985551164, 2.0378611501724695, 2.850096320623325, -0.2319194264910919, 4.292601868894432, 2.5698877206410295, 2.1692589171874177, 4.803832645519485, 1.3324809467335732, 3.7711287367729436, 0.002452262163629343, 3.8835395708223808, 1.5230480986649795, -0.3082869961007842, 1.3730955963031168, -1.0273364897236918, -0.015007381827157634, 2.7001746788461713, 3.6921485701764265, 4.42674304145081, 1.819126900428969, -0.6020042959007322, -0.8002156957846012, 2.7983189642327013, 0.8990847086176871, 2.984341182799516, -1.466606755170433, 3.3656388958958168, 2.6133834610270954, 3.4000832689954965, 1.0133943274675499, -1.4103039338715775, 1.5788161451933862, 0.6637985333131491, 1.1604831787142815, 1.2313263533561216, -0.8856188218926153, 2.8359860536083437, 5.267019666477463, 2.5884129658251878, 0.7568902558106052, 3.253476514459219, 3.0043635477342825, 1.9675466957179846, 2.571481246006703, 3.4343982067819834, 1.4518934526760963, 2.698750116625943, 4.024045455599482, 3.021470137203842, 0.7984572379044795, 0.712960242491427)
pred.se
// res14: DenseVector[Double] = DenseVector(0.4026028127966089, 0.6089258177156032, 0.434185647366702, 0.43032982405260417, 0.605792222079641, 0.3841294088664351, 0.5665554766818338, 0.8419107992244121, 0.5301183954302761, 0.6942965109971094, 0.8403330939855574, 0.34557537940901406, 0.5376930646497862, 0.6588021339081644, 0.5819997679636411, 0.3806049522681284, 0.6637615426350798, 0.7331242472192665, 0.4178057010227749, 0.7313401246741887, 0.47300497748599074, 0.5363817713417605, 0.34251293936626004, 0.7517981539219954, 0.6487045174460812, 0.4855651935235649, 0.4884707431534488, 0.5257085415584906, 0.8961677164183379, 0.41759226372679, 0.6531986488867257, 0.5427347427706173, 0.8328730462190123, 0.3575412148511521, 0.3991325585404311, 0.49521916546924244, 0.9711406971771711, 0.35241845347868866, 0.49271843046315267, 0.3425545183531268, 0.6843706964066608, 0.4239218271875283, 0.7271691762000482, 0.5514771327942714, 0.5552135174438537, 0.849288779517511, 0.4928413120452795, 0.5345183208116874, 0.48838527611488225, 0.46808552644073353, 0.5265194088204727, 0.5385157279893215, 0.5845347841229727, 0.33543143821659577, 0.6521085278382467, 0.6345018229333117, 0.3794467663447645, 0.48046146150072333, 0.4701595346989966, 0.5392796669515202, 0.5006952050463485, 0.6481571613203818, 1.0981420795765253, 0.7499182036380357, 0.5905108778844244, 0.8044683182124039, 0.5368514047742808, 1.1326028549133653, 0.32630713644730847, 0.5579328153270332, 0.6007845084368, 0.8334893412404312, 0.7333554455265463, 0.43884180212064333, 0.9799047759135809, 0.6548786386027352, 0.36277517361377226, 0.4457318212529634, 0.4268576778589269, 0.7572699861981953, 0.47954083347950627, 0.5423599211404304, 0.38479436681472134, 0.6745759406493114, 0.6599624244396496, 0.5074525103071329, 0.740015166330906, 1.0135906937400525, 0.45441855085438265, 0.7533198385314737, 1.017707898571309, 0.38620588845394055, 0.422517626506005, 0.6997927361007219, 0.43893212662989606, 0.9078845892614668, 0.6621539857073222, 0.4873455480049849, 0.6987322534495665, 0.5563912209047827)
val predNew = lm.predict(DenseMatrix((1.1, 1.6, 1.0), (1.4, 2.2, 3.0)))
// predNew: PredictLm = PredictLm(
//   mod = Lm(
//     y = DenseVector(5.949224514441652, 0.974079840951314, 5.321432965615157, 0.39514868394324476, 6.737148268737804, 2.7875892488216456, -1.8300265932081095, 8.228992751311534, -0.2294933611817651, 2.7359314894123994, 11.807725847544774, 0.7124632326152671, 4.184343300956794, 1.3234877610181814, -5.304268415987465, -0.004600011622434952, -3.382837243615463, 2.5009983517554972, 5.217959402944194, -0.380056657959166, -1.1702337436295682, -0.4552307383129176, 5.310593284910109, 5.865647879015152, 0.3562985760928503, 2.4017258969713797, 2.3148018270513346, 8.223630201624278, 4.099694230992274, 1.0894655718878283, -2.5681799486141683, 0.9721371548936971, 2.3318037650637295, -1.3652426952255148, 1.4461469565930167, 3.0641928048425005, 0.045620517732186894, 1.2295166886851967, 7.861453002280167, 5.377277132391015, 1.6416737136248374, 1.6274002673506889, -0.5686653498500567, 0.23887418269689153, -5.126878851368223, 0.8678664225602551, -3.0527753969513327, 2.1913041494975136, 3.2376543767246546, 1.9433840857020928, -1.2423408255084318, 0.38521421013350227, -0.8414739511801281, -1.247061532913444, 3.9006185104502684, 3.7181817905137624, 5.646059580367597, -2.293408485552095, 1.8065710296706354, 6.9312235544402885, 4.601372454862956, -2.32681287602324, 1.8494977954465086, 1.0761248808364565, -0.6510712692657471, 3.367342930078309, 2.927781578784357, 8.906281917243183, 4.823502956165245, 3.629403512373911, -6.321101649343605, 3.7625434514984515, -4.864201535794233, 2.368803540204064, -5.236159930871481, 4.972544984988991, 6.986480584427207, -0.4673907987786161, 0.16925640335008652, 1.0469437891051854, -0.06327597675784391, -2.0802788197838105, 2.0109861268381564, 0.828239003942695, -4.692336856324896, 3.6097669745442587, 5.645244873042673, 4.68886450567, 4.044578641895242, -0.6838476605463897, 2.33289543763257, 3.287327017942031, 4.363013105504336, 2.273402231774514, 6.345353698385956, 6.937078951245249, 9.860413935673254, 0.018786580517509144, -1.4139641160050775, 3.5856572663279604),
//     Xmat = 0.49574380875458324    -0.11786816547759194   3.483413381888676      
// -1.2192229003369857    2.7994612256835243     -0.05261420339784717   
// 0.973225504268477      0.2637772744188578     3.0410666279544927     
// -0.3621783128880418    -0.30902455396961304   0.5968619628329994     
// -0.8107689758712942    2.731895065853119      5.436795341864435      
// -0.046483835374395654  1.228440651276292      -0.10868581277033584   
// -0.7583378981830228    -0.9896991959249324    0.007880228879485474   
// 0.970280262537055      4.9911049492301816     3.3353411787176426     
// 1.3987505100855768     0.29139758728827714    0.7066340702448444     
// 1.7075740579724783     -1.3748685793052706    2.618152879481108      
// 2.466192542401559      2.651258431535629      0.931380985765726      
// 0.38782219621371056    1.5777445622403103     1.7937002490905336     
// -1.458916767968931     0.8186491567783283     1.6450392331522399     
// 1.66051558131452       -1.0619833653496875    2.974223301467104      
// 0.5787281129333423     -0.22565733535692534   -2.03789956237296      
// -0.6954070202146031    1.3208178588769597     1.924698501349781      
// -0.6987693276053462    -0.958444575920893     6.572404362190651      
// 0.004448293173311423   4.503963100892397      4.152871176169467      
// 0.732250950578053      -0.2274995700864917    2.269655118213705      
// 2.2123804248323804     1.820437422306333      3.024797866062426      
// -0.22877080920425052   -0.49544895588908244   4.64453472029468       
// ...
predNew.fitted
// res15: DenseVector[Double] = DenseVector(3.6431004599162, 4.608720565188841)
predNew.se
// res16: DenseVector[Double] = DenseVector(0.46548438996152974, 0.5624493486140578)
lm.plots
// res17: Figure = breeze.plot.Figure@50c5b440
lm.summary
// Estimate	 S.E.	 t-stat	p-value		Variable
// ---------------------------------------------------------
//   1.1510	 0.413	 2.784	0.0065 *	(Intercept)
//   1.3806	 0.291	 4.750	0.0000 *	V1
//   0.5367	 0.175	 3.062	0.0028 *	V2
//   0.1147	 0.100	 1.145	0.2551  	V3
// 
// Residual standard error:   3.0855 on 96 degrees of freedom
// Multiple R-squared: 0.2495, Adjusted R-squared: 0.2260
// F-statistic: 10.6373 on 3 and 96 DF, p-value: 0.00000
//

The plots include a plot of studentised residuals against fitted values and a normal Q-Q plot for the studentised residuals.

Linear model plots

Generalised linear models

The current implementation supports only simple one-parameter exponential family observation models. This includes the most commonly used cases of logistic regression (LogisticGlm) and Poisson regression (PoissonGlm).

Logistic regression

Again, we start by creating an appropriate response variable.

val ylb = (0 until 100) map (i => Bernoulli(sigmoid(1.0 + X(i,0))).sample())
val yl = DenseVector(ylb.toArray map {b => if (b) 1.0 else 0.0})

Then we can do logistic regression in a typical way.

import scalaglm.{Glm, LogisticGlm}
val glm = Glm(yl, X, List("V1","V2","V3"), LogisticGlm)
glm.coefficients
// res20: DenseVector[Double] = DenseVector(0.9618483582350331, 1.2685356167853103, -0.028586417617382308, 0.14194742280758021)
glm.fitted
// res21: DenseVector[Double] = DenseVector(0.8897851137853944, 0.3379877665924792, 0.9321725335543807, 0.64473358016894, 0.6518095979758368, 0.701057270895869, 0.5073195414524666, 0.9257693994574865, 0.9441801830656538, 0.9717748163408412, 0.9844300272576861, 0.8406836644811354, 0.33654421150764097, 0.9712742001053758, 0.8042641519073768, 0.578141409105371, 0.7380327947895273, 0.8066251247490215, 0.9019790322806647, 0.9844137044490681, 0.7933252984937843, 0.4404662885788357, 0.8524001097825549, 0.9746745141523351, 0.5874964671839794, 0.8419511730042314, 0.6326390461580954, 0.9500403344052327, 0.9834782632436141, 0.9295083317184997, 0.31353903901624014, 0.5450210975707473, 0.636595449067648, 0.6682398077046907, 0.627047741681821, 0.6424452186338697, 0.1454612841973224, 0.8672922895957598, 0.46597691578572487, 0.8164556134544145, 0.8528047444877301, 0.7369405032275012, 0.5981904302136242, 0.634713843763399, 0.45519273042340813, 0.8026061590737774, 0.5697257362489737, 0.9097742581776394, 0.862948540933531, 0.6329109606456778, 0.9230765134572578, 0.5478872621642056, 0.9694785698854201, 0.8511469124327676, 0.9301880014360996, 0.9776013353019051, 0.7959647290017665, 0.8840170911822269, 0.5386598758514638, 0.9609617349520923, 0.7472249277419054, 0.23726090811341496, 0.2680401451507748, 0.34592526695373926, 0.5745454290351908, 0.9746978838630863, 0.8882528082932001, 0.5887115449928366, 0.7306343856246509, 0.3400812541763753, 0.3057182145024496, 0.9072187014696569, 0.25088129296218004, 0.9204500901662942, 0.058545218010322836, 0.9685161646008571, 0.8926120998715271, 0.9213833673615152, 0.787221842969656, 0.22340114641598854, 0.827439840063854, 0.7909733985356759, 0.6817354937353646, 0.7473780413844479, 0.39515018792634066, 0.861170807418572, 0.9330996480404391, 0.9771758395849184, 0.6087566327227096, 0.8190591606212057, 0.9005883040166134, 0.8683988600364458, 0.7295480467971781, 0.7601833960926963, 0.8418125042853798, 0.5156548194823548, 0.8443133940302021, 0.9075408038762346, 0.8578256617997633, 0.5990501270618993)
glm.predict(response=true).fitted
// res22: DenseVector[Double] = DenseVector(0.8897851137853944, 0.3379877665924792, 0.9321725335543807, 0.64473358016894, 0.6518095979758368, 0.701057270895869, 0.5073195414524666, 0.9257693994574865, 0.9441801830656538, 0.9717748163408412, 0.9844300272576861, 0.8406836644811354, 0.33654421150764097, 0.9712742001053758, 0.8042641519073768, 0.578141409105371, 0.7380327947895273, 0.8066251247490215, 0.9019790322806647, 0.9844137044490681, 0.7933252984937843, 0.4404662885788357, 0.8524001097825549, 0.9746745141523351, 0.5874964671839794, 0.8419511730042314, 0.6326390461580954, 0.9500403344052327, 0.9834782632436141, 0.9295083317184997, 0.31353903901624014, 0.5450210975707473, 0.636595449067648, 0.6682398077046907, 0.627047741681821, 0.6424452186338697, 0.1454612841973224, 0.8672922895957598, 0.46597691578572487, 0.8164556134544145, 0.8528047444877301, 0.7369405032275012, 0.5981904302136242, 0.634713843763399, 0.45519273042340813, 0.8026061590737774, 0.5697257362489737, 0.9097742581776394, 0.862948540933531, 0.6329109606456778, 0.9230765134572578, 0.5478872621642056, 0.9694785698854201, 0.8511469124327676, 0.9301880014360996, 0.9776013353019051, 0.7959647290017665, 0.8840170911822269, 0.5386598758514638, 0.9609617349520923, 0.7472249277419054, 0.23726090811341496, 0.2680401451507748, 0.34592526695373926, 0.5745454290351908, 0.9746978838630863, 0.8882528082932001, 0.5887115449928366, 0.7306343856246509, 0.3400812541763753, 0.3057182145024496, 0.9072187014696569, 0.25088129296218004, 0.9204500901662942, 0.058545218010322836, 0.9685161646008571, 0.8926120998715271, 0.9213833673615152, 0.787221842969656, 0.22340114641598854, 0.827439840063854, 0.7909733985356759, 0.6817354937353646, 0.7473780413844479, 0.39515018792634066, 0.861170807418572, 0.9330996480404391, 0.9771758395849184, 0.6087566327227096, 0.8190591606212057, 0.9005883040166134, 0.8683988600364458, 0.7295480467971781, 0.7601833960926963, 0.8418125042853798, 0.5156548194823548, 0.8443133940302021, 0.9075408038762346, 0.8578256617997633, 0.5990501270618993)
glm.summary
// Estimate	 S.E.	 z-stat	p-value		Variable
// ---------------------------------------------------------
//   0.9618	 0.354	 2.714	0.0067 *	(Intercept)
//   1.2685	 0.319	 3.975	0.0001 *	V1
//  -0.0286	 0.145	-0.198	0.8433  	V2
//   0.1419	 0.084	 1.683	0.0924  	V3
glm.plots
// res24: Figure = breeze.plot.Figure@38c90a80

Logistic regression plots

Poisson regression

We first create an appropriate response, and then do Poisson regression.

val yp = DenseVector.tabulate(100)(i => Poisson(math.exp(-0.5 + X(i,0))).sample().toDouble)

import scalaglm.PoissonGlm
val pglm = Glm(yp, X, List("V1","V2","V3"), PoissonGlm)
pglm.coefficients
// res26: DenseVector[Double] = DenseVector(-0.4244648170070786, 0.9181623064023307, -0.04480872222598681, 0.019279001302784465)
pglm.summary
// Estimate	 S.E.	 z-stat	p-value		Variable
// ---------------------------------------------------------
//  -0.4245	 0.166	-2.553	0.0107 *	(Intercept)
//   0.9182	 0.103	 8.950	0.0000 *	V1
//  -0.0448	 0.055	-0.815	0.4149  	V2
//   0.0193	 0.034	 0.570	0.5684  	V3
pglm.plots
// res28: Figure = breeze.plot.Figure@4e08c94c

Poisson regression plots

Non-linear response

The above covers the main functionality of the library based on a linear reponse to variation in covariate values. For flexible modelling of a nonlinear response, see the documentation on flexible regression modelling.