This library contains code for principal components analysis based on a thin SVD of the centred data matrix. This is more numerically stable than a construction from the spectral decomposition of the covariance matrix. It is analogous to the R function prcomp
rather than the R function princomp
. First create some synthetic data.
import breeze.linalg.*
import breeze.numerics.*
import breeze.stats.distributions.*
import breeze.stats.distributions.Rand.VariableSeed.randBasis
val X = DenseMatrix.tabulate(100, 3)((i, j) =>
Gaussian(j, j+1).sample())
Now we can do PCA.
import scalaglm.Pca
val pca = Pca(X, List("V1", "V2", "V3"))
pca.sdev
// res0: DenseVector[Double] = DenseVector(2.728547759675747, 1.987936154156487, 1.1097023337895622)
pca.loadings
// res1: DenseMatrix[Double] = 0.06193835756279623 -0.08456605327086371 0.9944909363571962
// 0.010260954092400465 0.9964053503404088 0.084089777227204
// 0.9980272304318838 -0.0049960431539950445 -0.06258343925725532
pca.scores
// res2: DenseMatrix[Double] = 0.9047629357598381 -0.5299200464436641 -0.5240807582660614
// 0.8637073266617016 0.16515348680621914 -1.74011631757785
// -0.44121606665156804 -0.4869517736206971 0.35433506978365203
// 1.8144085999968782 -1.3098127412060183 0.16098458084108686
// 0.8780259813616295 1.027313490196844 -0.9388309770283811
// -1.0661667685568763 -4.488251124151808 0.41864813974349785
// 4.441272626343873 1.89957496073913 -1.0856672777067522
// -8.311963178238658 -0.9060343849234397 -0.7523621022639457
// 1.6594466915605122 -0.9072676720439277 -1.261911940286863
// -3.5504190619535283 -2.471506431080369 0.9284488826895289
// 1.6082304068074746 -1.7255566918045189 -0.2236308210479331
// -0.5363208290252273 -0.005275270037166707 0.9277491773608549
// 0.8953017259224131 -2.083411748267078 1.2841349461946265
// 1.119383812610313 -0.5054368000340514 1.27696155763164
// -2.576041118180525 1.2069079066065775 -0.4658797357255164
// -8.230198496072484 -1.0222899343234502 -0.5779856678040343
// 3.5013637793348074 2.079348935936572 -0.0386316995930519
// -0.27025023646780116 2.11940527409945 0.5276851399357351
// 1.8811466939289188 -2.5352841047214905 -0.46601852646434744
// -3.0577590548982894 4.175960414683598 -0.653704289348318
// 3.3434019019388574 -2.7966355228912088 1.6450034296661697
// -3.9229037718254953 -1.472184258878601 -0.48345704275900925
// -1.716990953657229 4.460781230765585 1.3185743547127724
// -1.5388420121738158 -3.646120488703724 -0.46119309345297366
// -0.2906608321345525 -0.9945095007859311 -0.45019022564784983
// 2.94815303898324 3.2646203799860025 -1.0697451416779722
// -1.451988061904455 -1.2036486546890444 1.2242264145782484
// -1.2608847696446934 0.3746231319875526 -1.268104242282745
// -1.0640017560956658 -2.0084754411812007 0.9085133389701702
// 0.649637973330807 2.352374345728234 -0.31617940604601247
// -0.255731856382579 -4.2001007990748525 -0.44440592812369234
// -1.2351764190358305 -0.8926194048603674 -0.10891603480908163
// -2.428046396169288 -0.21443036153345293 -0.39509636566042095
// 1.2990827055885315 2.0406283890935693 1.656897534494497
// 1.7112294566052515 0.06797696259970899 1.5098875035866741
// -1.854540535122585 0.4003177095357688 1.0446664677073771
// 2.490456471760174 -1.0555784155192791 0.7438562375683083
// -3.8637234993433327 -3.507991739212198 1.6393795638822783
// -0.2839546761357564 2.059617192354716 1.339022247531786
// -1.236483376898599 0.6386560955931905 -0.4417390504656854
// 2.636427698770329 0.8605598082075407 -0.3041066972883545
// 1.59414273332252 0.612371759536824 -0.4549712295927924
// -2.7419303610572703 0.6037448376994872 1.924769568427478
// 2.2972064781380004 -0.49126509553857695 1.8321783933197486
// 2.7175946347634876 1.3862835847569384 -1.0144925710672226
// -2.5970279155872897 -0.3192396601662638 0.4585738208850819
// 2.8565519359499985 1.5875112823119792 1.69459808475378
// 0.24654369295018244 -0.9868842744523179 -0.6137801844153655
// -1.734599991665737 1.00497254812563 -0.5919235192096912
// ...
pca.plots
// res3: Figure = breeze.plot.Figure@10915231
pca.summary
// Standard deviations:
// V1 V2 V3
// 2.729 1.988 1.110
// Cumulative proportion of variance explained:
// V1 V2 V3
// 0.590 0.902 1.000
// Loadings:
// PC01 PC02 PC03
// 0.062 -0.085 0.994 V1
// 0.010 0.996 0.084 V2
// 0.998 -0.005 -0.063 V3
The final line prints a readable summary of the PCA to the console. plots
produces some diagnostic plots, including a “scree plot”.
Note that there is also a utility function pairs
for producing a “scatterplot matrix”:
import scalaglm.Utils.pairs
pairs(X, List("V1", "V2", "V3"))
// res6: Figure = breeze.plot.Figure@387de232
This code computes regression coefficients and associated diagnostics via the QR decomposition of the covariate matrix. The diagnostics are analogous to those produced by the R function lm
. We start by creating a synthetic response variable.
val y = DenseVector.tabulate(100)(i =>
Gaussian(2.0 + 1.5*X(i,0) + 0.5*X(i,1), 3.0).sample())
So we can now do linear regression and generate all of the usual diagnostics.
import scalaglm.Lm
val lm = Lm(y,X,List("V1", "V2", "V3"))
lm.coefficients
// res8: DenseVector[Double] = DenseVector(1.5792802682038938, 1.8127878679794611, 0.503811112284449, 0.11142242099004708)
lm.se
// res9: DenseVector[Double] = DenseVector(0.41834603513896607, 0.2743905598184643, 0.1549886980043713, 0.11315397850464481)
lm.fitted
// res10: DenseVector[Double] = DenseVector(1.6554717052080132, -0.3472391210623895, 2.977366341104072, 2.8512334191770483, 1.4291072642931333, 1.5596643908654513, 2.2776136680978345, -1.0025299422210254, 0.34038534757890626, 2.6308599622701694, 1.9523542641180731, 4.177358628993007, 4.436320888497484, 5.02373301373174, 1.5712237830648244, -0.7037701573490855, 4.049941234729452, 4.242494399462532, 1.287301011199877, 2.149474706425229, 5.411129143833158, 0.2982412555452778, 6.180633484568918, 0.12744937584418148, 1.3562155868346815, 2.4407060618496663, 4.095765333750398, 0.10754072540792814, 3.323943137150212, 2.9827503164154114, 0.2588303732673095, 1.8030492395396476, 1.2403476660128787, 6.649612544473472, 5.786853422888146, 4.232064322253925, 4.165754676680281, 3.5052046541527533, 5.709940436953216, 1.724060373321517, 2.9398633682210775, 2.337818350492675, 5.717781061677693, 6.318575680371227, 1.83562302924167, 2.734431642174817, 6.917284320534324, 1.1809938808387686, 1.461626833609729, 5.626608263241267, -0.24413756134941123, 4.0740580431382805, 2.127276012757594, -0.036874787661630215, 3.0569561038006077, -0.8731738121972168, 2.473761655148068, 5.103386763120108, 1.6656829071866293, 2.59309936011982, 1.088218944027998, 1.539491990437942, -0.54911746383784, -1.8902647178238485, 2.5229433496991684, 2.222028243821093, 0.7273044287122752, -0.8540526959086718, 3.1104325375452513, 3.2530179703925057, 1.1305401210687274, 5.272322223009763, 1.6294668090356534, 2.691798509036161, 5.614670490895967, 3.617682047490712, 2.5993882781062845, 4.290398665567245, -0.4387619447309252, -1.1468988126579722, 3.2021918812804886, 0.7116363712855269, 0.7637668747888391, 4.127537275047586, 5.691929486195276, 2.194034105411175, 1.8405629711707876, 2.2313727602205042, 0.629004371892159, 5.331140765178667, -1.8867280873334824, 7.061023238028042, 8.126797372219892, 3.964543432016112, 4.15876276215086, 1.02954857751401, 8.733127126616218, 0.6471755775492716, 3.0662740729677713, 0.621032937092082)
lm.residuals
// res11: DenseVector[Double] = DenseVector(3.3450200612851724, 5.144792996901886, -2.7237147687010395, 2.175353866512631, -1.6069532042562968, -1.3300568198697151, -0.9685155286533862, 0.829840120328134, -1.6487311907176176, 0.2553606582076289, -3.7303337436871207, 3.4532282853836636, -0.6352570523620775, -3.962700530796691, -1.130457660806722, 2.566556667945865, -4.4209842169262945, 1.5343844957153099, -3.3843225151515597, -1.0592144307610747, 0.09389241656548464, -2.764265141278632, 0.7390517938033359, 1.6825142388521561, -1.8210179869758087, 1.1115673520202884, 3.1375018932564913, -1.4459050650363339, -2.5579635746379332, 1.0327817701195041, 3.188313441506085, 0.6820173882167919, 0.6155552547945877, 1.0588283510407237, -0.16509342418268513, 2.716108205713888, -3.508485465605867, 2.906222547160671, 2.0921308411426374, -8.127372311637524, -2.8461253484968827, 4.312882024391994, 1.4161219303901476, 2.9507265385560544, 3.7567075364217026, -2.5844395794988064, -1.3795766157829776, 0.9510606826454522, 0.7970313867170395, -1.8839519129500175, -1.278692723804792, 0.760481632035753, 2.8816430172804695, -2.079748233448844, -3.348645233713416, 4.033632434125538, -3.955094148094644, 6.427906163526229, 1.8729222561101624, -2.476000541641662, -3.4984635811695677, 1.6412124110620059, -0.20537178728060712, 3.9693486226613466, 4.432907311650274, -0.80634653329551, 1.9891324795522318, 5.028808890654734, -0.6562006739767963, 0.5110634809442924, -3.6938615729309254, -1.5149517059350996, 2.6547494959686615, 0.5719101904154247, -2.196309485092824, -7.695390628773531, 6.272889088868265, 4.867766041420812, -1.2662688552829011, 1.1021528769054638, 1.6097807176143446, -1.2550026878004519, 2.5031451615378235, -6.698590107896009, 2.4632061108825543, 1.0123982266129703, 1.653881100620683, -0.419767242740845, -2.316431010282464, -0.43750286037502395, -5.600439860804184, 1.1814401544285893, -2.276720124334897, 5.365134487736995, -2.6539224475879166, 0.21278302762180346, 0.6069936899889008, -1.0785617833191792, -0.3694499475438153, -6.709669949922711)
lm.studentised
// res12: DenseVector[Double] = DenseVector(1.1074227416129598, 1.7224494430857598, -0.9007337745144801, 0.7220127740291677, -0.533877007714103, -0.4520394370921568, -0.32762426605767037, 0.28923529906524176, -0.5499884980978588, 0.08611495653291731, -1.239678794343544, 1.1452052182865025, -0.2126574080847821, -1.3197342736184103, -0.37625081583647946, 0.8929691959718866, -1.4814883931991132, 0.5105218896524327, -1.1313484214818537, -0.3612069425681617, 0.031960037850714634, -0.926623532886654, 0.253168130538559, 0.5671389278777326, -0.6029457363721966, 0.3765671894843748, 1.046592575085059, -0.48153745713100515, -0.8531677045673766, 0.3439228714845027, 1.0791804131319105, 0.22580686029067384, 0.20434104050875743, 0.35625578168010674, -0.05517991426037153, 0.9037817539810024, -1.168506885127565, 0.998062879316707, 0.700339793482024, -2.6915659273590196, -0.9461021828611944, 1.4293161860111512, 0.47799343843362024, 0.9927352593538413, 1.2560688819040209, -0.8586980770524425, -0.4655469262057627, 0.3151166544587161, 0.2646032859190882, -0.6301403152613686, -0.42893434699632976, 0.2579359140421508, 0.9613430165886196, -0.6974658733111004, -1.1348692529570181, 1.399736262988049, -1.342697321587783, 2.1482409029343583, 0.6334012634790327, -0.8186600215054601, -1.1635902484786875, 0.5441019995879712, -0.07093326758766129, 1.38048931946562, 1.4669424511885378, -0.26817680526097026, 0.6616783003179829, 1.696623946498382, -0.21974588092737724, 0.1697565395684572, -1.2322639195826026, -0.5075400905267684, 0.8813772045075212, 0.19154826090939658, -0.7334078489380953, -2.558074804262729, 2.0834115079931763, 1.630853218401606, -0.4297593302865966, 0.38298040994030713, 0.5409183883848705, -0.42188032981777784, 0.8304212275144066, -2.2204472283962122, 0.8268293716454271, 0.33455482141331416, 0.5523380739095956, -0.14077911747960956, -0.7753017609750988, -0.14566355394344047, -1.912808055396636, 0.39857777959863544, -0.7778234516109511, 1.7927216746639754, -0.88772282511335, 0.0715702366834724, 0.21030617681751976, -0.36479173222368305, -0.12299223876273077, -2.228352919941074)
val pred = lm.predict()
// pred: PredictLm = PredictLm(
// mod = Lm(
// y = DenseVector(5.000491766493186, 4.797553875839497, 0.2536515724030326, 5.026587285689679, -0.17784593996316356, 0.22960757099573614, 1.3090981394444483, -0.17268982189289134, -1.3083458431387114, 2.8862206204777983, -1.7779794795690478, 7.6305869143766705, 3.8010638361354063, 1.0610324829350493, 0.4407661222581023, 1.8627865105967794, -0.3710429821968422, 5.776878895177842, -2.097021503951683, 1.0902602756641544, 5.505021560398642, -2.466023885733354, 6.919685278372254, 1.8099636146963376, -0.4648024001411273, 3.5522734138699548, 7.233267227006889, -1.3383643396284057, 0.7659795625122787, 4.015532086534916, 3.4471438147733946, 2.4850666277564395, 1.8559029208074664, 7.708440895514196, 5.6217599987054605, 6.948172527967813, 0.6572692110744143, 6.411427201313424, 7.802071278095854, -6.4033119383160075, 0.09373801972419482, 6.650700374884669, 7.13390299206784, 9.269302218927281, 5.592330565663373, 0.14999206267601073, 5.537707704751346, 2.1320545634842207, 2.2586582203267684, 3.7426563502912495, -1.5228302851542033, 4.8345396751740335, 5.008919030038063, -2.1166230211104744, -0.2916891299128084, 3.1604586219283215, -1.481332492946576, 11.531292926646337, 3.5386051632967916, 0.11709881847815762, -2.41024463714157, 3.180704401499948, -0.7544892511184471, 2.079083904837498, 6.955850661349443, 1.415681710525583, 2.716436908264507, 4.174756194746061, 2.454231863568455, 3.764081451336798, -2.563321451862198, 3.757370517074663, 4.284216305004315, 3.263708699451586, 3.418361005803143, -4.077708581282819, 8.872277366974549, 9.158164706988057, -1.7050308000138263, -0.04474593575250835, 4.811972598894833, -0.543366316514925, 3.2669120363266626, -2.571052832848423, 8.15513559707783, 3.2064323320241455, 3.4944440717914707, 1.8116055174796593, -1.687426638390305, 4.893637904803643, -7.4871679481376665, 8.242463392456632, 5.850077247884995, 9.329677919753108, 1.5048403145629434, 1.2423316051358135, 9.340120816605118, -0.4313862057699076, 2.696824125423956, -6.088637012830629),
// Xmat = -0.31553648449044747 0.6471905807785279 2.8910728986943655
// -1.5861953689503303 1.2370881391856454 2.9227293530459937
// 0.4710387088858163 0.7500591602546905 1.4925602477356024
// 0.48804919164187227 -0.06295784882139488 3.7599466741684853
// -0.8613469551581119 2.163675808254339 2.8825651793190783
// 0.834663148046144 -3.2378614396646537 0.8848081606766118
// -0.8604363905136284 3.0570166828296172 6.443614024512279
// -1.0816225844591665 0.15865971067766071 -6.291305023417237
// -0.970648235850089 0.2168991033373131 3.6923290953317087
// 1.0172367202815271 -1.2109880050495354 -1.6365241027427473
// 0.12794014690179714 -0.5116651540075692 3.5803228524039437
// 0.994669728011502 1.2772466045229995 1.3593504015739568
// 1.6135042947888054 -0.7487614879007183 2.7762273075213377
// 1.4868065877562768 0.8252372805800071 2.992432636277965
// -0.6201280663861155 2.3469529862362757 -0.5951840184991903
// -0.8933111577708884 0.05832431842622143 -6.220033903380013
// 0.10741181615821853 3.3145450605506492 5.43913415503228
// 0.43361379624949636 3.363378508816771 1.639318485861662
// -0.027732791107190843 -1.336063827213195 3.871915627916051
// -1.0878357529601979 5.284595780219742 -1.0790304446451389
// 2.1843305545887834 -1.4039562790230415 5.200476851952116
// -0.49447074876195707 -0.33780694919874854 -1.924904714089287
// ...
pred.fitted
// res13: DenseVector[Double] = DenseVector(1.6554717052080092, -0.3472391210623922, 2.977366341104071, 2.8512334191770483, 1.4291072642931333, 1.5596643908654508, 2.277613668097835, -1.0025299422210256, 0.3403853475789065, 2.630859962270169, 1.952354264118073, 4.177358628993006, 4.436320888497484, 5.023733013731739, 1.5712237830648241, -0.7037701573490858, 4.049941234729452, 4.242494399462532, 1.2873010111998766, 2.14947470642523, 5.411129143833157, 0.29824125554527736, 6.180633484568918, 0.12744937584418067, 1.356215586834681, 2.440706061849667, 4.095765333750398, 0.10754072540792814, 3.323943137150212, 2.9827503164154114, 0.258830373267309, 1.8030492395396471, 1.2403476660128785, 6.649612544473472, 5.786853422888146, 4.232064322253924, 4.165754676680281, 3.505204654152753, 5.709940436953216, 1.724060373321517, 2.939863368221078, 2.3378183504926753, 5.717781061677692, 6.318575680371227, 1.83562302924167, 2.734431642174817, 6.917284320534323, 1.180993880838768, 1.461626833609729, 5.626608263241267, -0.24413756134941078, 4.0740580431382805, 2.127276012757594, -0.03687478766162977, 3.0569561038006077, -0.8731738121972177, 2.473761655148067, 5.103386763120108, 1.6656829071866297, 2.59309936011982, 1.0882189440279977, 1.5394919904379418, -0.5491174638378391, -1.8902647178238492, 2.522943349699168, 2.2220282438210934, 0.7273044287122752, -0.854052695908671, 3.1104325375452504, 3.253017970392505, 1.1305401210687274, 5.272322223009763, 1.6294668090356537, 2.691798509036161, 5.614670490895967, 3.617682047490712, 2.599388278106284, 4.290398665567244, -0.4387619447309258, -1.1468988126579718, 3.2021918812804886, 0.7116363712855271, 0.7637668747888386, 4.127537275047585, 5.691929486195277, 2.194034105411175, 1.8405629711707874, 2.231372760220504, 0.629004371892159, 5.331140765178667, -1.886728087333482, 7.061023238028042, 8.126797372219892, 3.964543432016112, 4.15876276215086, 1.0295485775140099, 8.733127126616218, 0.6471755775492709, 3.066274072967771, 0.6210329370920813)
pred.se
// res14: DenseVector[Double] = DenseVector(0.36098285910275374, 0.5765395500952828, 0.3318482126435473, 0.420024088277935, 0.44055293033595466, 0.7723954709118539, 0.717644484925566, 1.011119062339259, 0.5171655470731374, 0.6787649114395843, 0.44636080333754513, 0.40185392734870723, 0.5748423624971433, 0.4879456130209306, 0.4762211427802778, 0.9965306279203408, 0.5906319278958592, 0.46993677764212094, 0.5527111895940164, 0.8092270016781467, 0.7894878256100364, 0.5956202105611448, 0.85568281275937, 0.6729538557888599, 0.36383928482938205, 0.7352691329991025, 0.5167612542651514, 0.4877351151518091, 0.5146101023747385, 0.48612330452157887, 0.7249927475334715, 0.362549160056118, 0.42366975609806456, 0.6485463511677751, 0.5499681405164563, 0.47154211119734313, 0.48864077565240616, 0.8803707533709783, 0.5744469186897239, 0.3690467885059458, 0.45204666754281586, 0.3860434305464959, 0.6904859716669771, 0.6475458339231344, 0.5557382975662377, 0.4422598110262731, 0.6874448241800958, 0.38070469410956503, 0.4252081406373542, 0.5616854588311953, 0.6058800910683789, 0.7492110194996285, 0.5185417414352489, 0.6020698937532067, 0.7398914704838077, 0.9745530748061415, 0.7597653500193584, 0.5485497901630456, 0.7145443286095792, 0.3266048073051459, 0.46291291046133254, 0.39435746484129713, 0.9334581167775906, 0.9932443925161901, 0.3497285708301052, 0.4618673248875804, 0.4656235043873754, 0.6845798893272754, 0.5802853004338321, 0.43644538560912366, 0.5179396711218133, 0.5868792221703605, 0.4261140313954504, 0.5826264344646999, 0.5347824176340206, 0.45197976813422114, 0.434325788174219, 0.5873549099818024, 0.7565519430546226, 0.9859456639928137, 0.6303493235709393, 0.6361307571799265, 0.40981517004500223, 0.39123814365751297, 0.6156071565697164, 0.31094144238202875, 0.5366586165720003, 0.6026645009318472, 0.571988772475587, 0.48258635493890545, 0.8256017791905883, 0.6840150897516762, 0.8285157865971651, 0.5454891936218659, 0.5624802336429504, 0.6441111639366153, 0.9610565740006528, 0.715692398354323, 0.48051963074552256, 0.43314022194780544)
val predNew = lm.predict(DenseMatrix((1.1, 1.6, 1.0), (1.4, 2.2, 3.0)))
// predNew: PredictLm = PredictLm(
// mod = Lm(
// y = DenseVector(5.000491766493186, 4.797553875839497, 0.2536515724030326, 5.026587285689679, -0.17784593996316356, 0.22960757099573614, 1.3090981394444483, -0.17268982189289134, -1.3083458431387114, 2.8862206204777983, -1.7779794795690478, 7.6305869143766705, 3.8010638361354063, 1.0610324829350493, 0.4407661222581023, 1.8627865105967794, -0.3710429821968422, 5.776878895177842, -2.097021503951683, 1.0902602756641544, 5.505021560398642, -2.466023885733354, 6.919685278372254, 1.8099636146963376, -0.4648024001411273, 3.5522734138699548, 7.233267227006889, -1.3383643396284057, 0.7659795625122787, 4.015532086534916, 3.4471438147733946, 2.4850666277564395, 1.8559029208074664, 7.708440895514196, 5.6217599987054605, 6.948172527967813, 0.6572692110744143, 6.411427201313424, 7.802071278095854, -6.4033119383160075, 0.09373801972419482, 6.650700374884669, 7.13390299206784, 9.269302218927281, 5.592330565663373, 0.14999206267601073, 5.537707704751346, 2.1320545634842207, 2.2586582203267684, 3.7426563502912495, -1.5228302851542033, 4.8345396751740335, 5.008919030038063, -2.1166230211104744, -0.2916891299128084, 3.1604586219283215, -1.481332492946576, 11.531292926646337, 3.5386051632967916, 0.11709881847815762, -2.41024463714157, 3.180704401499948, -0.7544892511184471, 2.079083904837498, 6.955850661349443, 1.415681710525583, 2.716436908264507, 4.174756194746061, 2.454231863568455, 3.764081451336798, -2.563321451862198, 3.757370517074663, 4.284216305004315, 3.263708699451586, 3.418361005803143, -4.077708581282819, 8.872277366974549, 9.158164706988057, -1.7050308000138263, -0.04474593575250835, 4.811972598894833, -0.543366316514925, 3.2669120363266626, -2.571052832848423, 8.15513559707783, 3.2064323320241455, 3.4944440717914707, 1.8116055174796593, -1.687426638390305, 4.893637904803643, -7.4871679481376665, 8.242463392456632, 5.850077247884995, 9.329677919753108, 1.5048403145629434, 1.2423316051358135, 9.340120816605118, -0.4313862057699076, 2.696824125423956, -6.088637012830629),
// Xmat = -0.31553648449044747 0.6471905807785279 2.8910728986943655
// -1.5861953689503303 1.2370881391856454 2.9227293530459937
// 0.4710387088858163 0.7500591602546905 1.4925602477356024
// 0.48804919164187227 -0.06295784882139488 3.7599466741684853
// -0.8613469551581119 2.163675808254339 2.8825651793190783
// 0.834663148046144 -3.2378614396646537 0.8848081606766118
// -0.8604363905136284 3.0570166828296172 6.443614024512279
// -1.0816225844591665 0.15865971067766071 -6.291305023417237
// -0.970648235850089 0.2168991033373131 3.6923290953317087
// 1.0172367202815271 -1.2109880050495354 -1.6365241027427473
// 0.12794014690179714 -0.5116651540075692 3.5803228524039437
// 0.994669728011502 1.2772466045229995 1.3593504015739568
// 1.6135042947888054 -0.7487614879007183 2.7762273075213377
// 1.4868065877562768 0.8252372805800071 2.992432636277965
// -0.6201280663861155 2.3469529862362757 -0.5951840184991903
// -0.8933111577708884 0.05832431842622143 -6.220033903380013
// 0.10741181615821853 3.3145450605506492 5.43913415503228
// 0.43361379624949636 3.363378508816771 1.639318485861662
// -0.027732791107190843 -1.336063827213195 3.871915627916051
// -1.0878357529601979 5.284595780219742 -1.0790304446451389
// 2.1843305545887834 -1.4039562790230415 5.200476851952116
// -0.49447074876195707 -0.33780694919874854 -1.924904714089287
// ...
predNew.fitted
// res15: DenseVector[Double] = DenseVector(4.490867123626467, 5.5598349933710685)
predNew.se
// res16: DenseVector[Double] = DenseVector(0.4399149549884285, 0.5060912642856749)
lm.plots
// res17: Figure = breeze.plot.Figure@42ee03ba
lm.summary
// Estimate S.E. t-stat p-value Variable
// ---------------------------------------------------------
// 1.5793 0.418 3.775 0.0003 * (Intercept)
// 1.8128 0.274 6.607 0.0000 * V1
// 0.5038 0.155 3.251 0.0016 * V2
// 0.1114 0.113 0.985 0.3272 V3
//
// Residual standard error: 3.0420 on 96 degrees of freedom
// Multiple R-squared: 0.3592, Adjusted R-squared: 0.3391
// F-statistic: 17.9343 on 3 and 96 DF, p-value: 0.00000
//
The plots include a plot of studentised residuals against fitted values and a normal Q-Q plot for the studentised residuals.
The current implementation supports only simple one-parameter exponential family observation models. This includes the most commonly used cases of logistic regression (LogisticGlm
) and Poisson regression (PoissonGlm
).
Again, we start by creating an appropriate response variable.
val ylb = (0 until 100) map (i => Bernoulli(sigmoid(1.0 + X(i,0))).sample())
val yl = DenseVector(ylb.toArray map {b => if (b) 1.0 else 0.0})
Then we can do logistic regression in a typical way.
import scalaglm.{Glm, LogisticGlm}
val glm = Glm(yl, X, List("V1","V2","V3"), LogisticGlm)
glm.coefficients
// res20: DenseVector[Double] = DenseVector(0.5056329181869673, 0.9710909223373286, 0.08906305299560299, 0.15674828356370712)
glm.fitted
// res21: DenseVector[Double] = DenseVector(0.6704077413824826, 0.38546977005484667, 0.7796818215449939, 0.8268308141937428, 0.5777948219023717, 0.7625090862471243, 0.7215902740300896, 0.17994185128154663, 0.5401953658775045, 0.7556701159031066, 0.7586904735494423, 0.8579514955537578, 0.9198988990628084, 0.9235800811011595, 0.5047910801364148, 0.20888846161343522, 0.8529236257767637, 0.8150591219179962, 0.7244460905587327, 0.43801341479248856, 0.9650054589488241, 0.4240047477193439, 0.8748113304289427, 0.555010776125046, 0.6267968204523753, 0.6572174225607317, 0.8673647068552129, 0.3840184279952476, 0.8393473732678258, 0.7050919299620699, 0.6254985970237854, 0.6553075503936121, 0.527783750198815, 0.9484365588958288, 0.945240807887592, 0.8357134772251201, 0.9066505225544612, 0.8506636241199589, 0.9056424692758666, 0.5818711953158778, 0.7869627493891542, 0.7177283272503255, 0.907439385231895, 0.963847307071916, 0.6554396100366324, 0.7101942163078864, 0.9638907405505939, 0.6172236118432377, 0.5198812075270888, 0.908592793597791, 0.29037752244269055, 0.7456935803811918, 0.76744805044157, 0.5173307121445888, 0.5721467681844457, 0.5648488290155359, 0.6343735428822619, 0.9361035844132839, 0.41023952674668585, 0.7607535946855035, 0.6590228248709451, 0.6772893655397066, 0.1765434819019684, 0.3731189340330755, 0.7416180012680216, 0.5896903278468001, 0.5650636240839731, 0.2142738809814442, 0.8000179452521201, 0.8313217977456128, 0.5969931990563396, 0.9403435166146437, 0.5398039638361359, 0.7704913977341118, 0.9116539471799688, 0.8676753459531258, 0.7879198227699619, 0.8404001279083188, 0.564515850592997, 0.1245416190378778, 0.8467900293921589, 0.6425375729791055, 0.5031597296394748, 0.878592806636603, 0.893351055571683, 0.7013179681646807, 0.7671247312681356, 0.82731141533458, 0.5014587637578676, 0.9144411252703938, 0.12794460530719787, 0.9633810333048981, 0.975851411577049, 0.7548019571098654, 0.8364533897502335, 0.6085086883660169, 0.9862754624572856, 0.5363280363597109, 0.800842720430418, 0.5249009092838681)
glm.predict(response=true).fitted
// res22: DenseVector[Double] = DenseVector(0.6704077413824826, 0.38546977005484667, 0.7796818215449939, 0.8268308141937428, 0.5777948219023717, 0.7625090862471243, 0.7215902740300896, 0.17994185128154663, 0.5401953658775045, 0.7556701159031066, 0.7586904735494423, 0.8579514955537578, 0.9198988990628084, 0.9235800811011595, 0.5047910801364148, 0.20888846161343522, 0.8529236257767637, 0.8150591219179962, 0.7244460905587327, 0.43801341479248856, 0.9650054589488241, 0.4240047477193439, 0.8748113304289427, 0.555010776125046, 0.6267968204523753, 0.6572174225607317, 0.8673647068552129, 0.3840184279952476, 0.8393473732678258, 0.7050919299620699, 0.6254985970237854, 0.6553075503936121, 0.527783750198815, 0.9484365588958288, 0.945240807887592, 0.8357134772251201, 0.9066505225544612, 0.8506636241199589, 0.9056424692758666, 0.5818711953158778, 0.7869627493891542, 0.7177283272503255, 0.907439385231895, 0.963847307071916, 0.6554396100366324, 0.7101942163078864, 0.9638907405505939, 0.6172236118432377, 0.5198812075270888, 0.908592793597791, 0.29037752244269055, 0.7456935803811918, 0.76744805044157, 0.5173307121445888, 0.5721467681844457, 0.5648488290155359, 0.6343735428822619, 0.9361035844132839, 0.41023952674668585, 0.7607535946855035, 0.6590228248709451, 0.6772893655397066, 0.1765434819019684, 0.3731189340330755, 0.7416180012680216, 0.5896903278468001, 0.5650636240839731, 0.2142738809814442, 0.8000179452521201, 0.8313217977456128, 0.5969931990563396, 0.9403435166146437, 0.5398039638361359, 0.7704913977341118, 0.9116539471799688, 0.8676753459531258, 0.7879198227699619, 0.8404001279083188, 0.564515850592997, 0.1245416190378778, 0.8467900293921589, 0.6425375729791055, 0.5031597296394748, 0.878592806636603, 0.893351055571683, 0.7013179681646807, 0.7671247312681356, 0.82731141533458, 0.5014587637578676, 0.9144411252703938, 0.12794460530719787, 0.9633810333048981, 0.975851411577049, 0.7548019571098654, 0.8364533897502335, 0.6085086883660169, 0.9862754624572856, 0.5363280363597109, 0.800842720430418, 0.5249009092838681)
glm.summary
// Estimate S.E. z-stat p-value Variable
// ---------------------------------------------------------
// 0.5056 0.322 1.568 0.1168 (Intercept)
// 0.9711 0.272 3.564 0.0004 * V1
// 0.0891 0.122 0.728 0.4666 V2
// 0.1567 0.091 1.731 0.0834 V3
glm.plots
// res24: Figure = breeze.plot.Figure@499ddf
We first create an appropriate response, and then do Poisson regression.
val yp = DenseVector.tabulate(100)(i => Poisson(math.exp(-0.5 + X(i,0))).sample().toDouble)
import scalaglm.PoissonGlm
val pglm = Glm(yp, X, List("V1","V2","V3"), PoissonGlm)
pglm.coefficients
// res26: DenseVector[Double] = DenseVector(-0.6429259352456109, 1.0278794682705714, 0.06402965023825971, 0.012715808160021129)
pglm.summary
// Estimate S.E. z-stat p-value Variable
// ---------------------------------------------------------
// -0.6429 0.170 -3.778 0.0002 * (Intercept)
// 1.0279 0.097 10.565 0.0000 * V1
// 0.0640 0.048 1.327 0.1844 V2
// 0.0127 0.041 0.313 0.7543 V3
pglm.plots
// res28: Figure = breeze.plot.Figure@6383c115
The above covers the main functionality of the library based on a linear reponse to variation in covariate values. For flexible modelling of a nonlinear response, see the documentation on flexible regression modelling.