Machine Learning and Android Notes

Machine Learning

GMM
GMM stands for Gaussian Mixture Model. For example, for digit 0, we generate a GMM for it, based on its samples. So we have 10 GMM for all the digits. Then for an incoming test data, we computes its score, or how much it matches with all the 10 GMMs, and choose the one with highest score. For example, in digit 5 it has highest score, so we can think that this test should be digit 5. It is a process similar with 1-over-n.
Poly-SVM
First, SVM stands for Support Vector Machine. For a classification problem, for example, digit 0, we put all samples from 0 to 9 on the space. Ideally data points of 0 should be in a cluster and be far from other points. So we can virtually draw a line to separate the 0 cluster from other digits. But how should we choose that line? We normally choose the naughty points, which are not so close to the cluster center where they should be. For example, we have some naughty 0 which are not close to other 0s. So after finding this line, we can compute a score for an incoming data, and decide whether it belongs to this class or other class. Thus we need to have 10 SVMs, which is similar with 1-over-n process. And we compute 10 scores and choose the highest one.
But for some cases the data set may not be linearly separable, i.e. where the boundary may be elliptic, then we need to map data to a higher degree space, where we can utilize the combination effect of them. The polynomial kernel is usually expressed as (x’x+c)^d. For example, if d = 2, and x is 2-dimension vector, then after expansion we can have a 5-dimension vector space.
Principal Component Analysis
Principal Component Analysis (PCA) is a method to project high dimension data onto several ‘most important’ dimensions, where data have largest variance on that. By analyzing on those dimensions, like doing SVM on that, we can save lots of effort. And normally natural data tends to divert on limited number of dimensions.
Linear Discriminant Analysis
Linear Discriminant Analysis (LDA) is similar with PCA, in that both of them try to decrease the dimension and separate data widely as possible. PCA does not take class into account, while LDA tries to separate data in different classes.

Android

In android we have 5 kinds of process: foreground, visible, service, background, and empty process.
UI thread: There is only 1 UI thread for each activity. Reason for this is first, normally we will only have 1 UI thread, so it does not need to care about some tedious job which may freeze the display. Second, similar with many UI thread in other platform, UI toolkit in Android is not thread safe. So only UI thread can change the UI display.
Then how other threads can change the display? By using message queue that the UI thread is listening to. By adding your message that you want to tell the UI thread, like some data, we can change UI display indirectly.
Fragment Manager: During usage of Android app, there can be runtime change, like screen orientation. In this case, Android may restart the application, and the onDestroy() and onCreate() will be called. We may lose the previous state during this process. One way to retain the data back is to store data before configuration change, and use method like onSaveInstanceState() to store and onRestoreInstanceState() to retrieve them back. But this method only works for small volume of data. For large data, we can use FragmentManager, where the data is stored in the fragment’s data. So in onCreate(), we can check whether a fragment is stored so that we need to retrieve them.
Memory Usage: Loading image is an important problem in Android app. We often load the image in background thread, then display it with UI thread. Because the time needed to load the image is hard to predict, which depends on network, disk cache reading speed. If we load it in UI thread, application may pause and has no response, then user can choose to destroy the activity, which is really bad for the application.
When application needs to use image, which is loaded as BitMap, like in ListView, GridView, often the image will eat up memory soon, which will cause OutOfMemory exception. One solution is to use the low resolution image, by setting the dimension according to the device, as loading an image with higher resolution than display may have no difference in display. The other solution is to store image into memory cache (which is implemented as a LRUCache). But this is not safe enough, cases like an interruption on activity due to a phone call, or configuration change, will clear up memory. The solution is to cache them into disk, like DiskLRUCache. Then next time the application does not need to get the image data from network again, which increase the fluidity of display.