- Detection of Exfiltration and Tunneling over DNS.
A Das, M-Y Shen, M Shashanka and J Wang
IEEE Intl. Conf. on Machine Learning and Applications, Cancun, Mexico, Dec 2017. [ Citations ]
Abstract
This paper proposes a method to detect two primary means of using the Domain Name System (DNS) for malicious purposes. We develop machine learning models to detect information exfiltration from compromised machines and the establishment of command & control (C&C) servers via tunneling. We validate our approach by experiments where we successfully detect a malware used in several recent Advanced Persistent Threat (APT) attacks. The novelty of our method is its robustness, simplicity, scalability, and ease of deployment in a production environment.
- User and Entity Behavior Analytics for Enterprise Security.
M Shashanka, M-Y Shen, J Wang
IEEE Intl. Conf. on Big Data, Washington DC, USA, Dec 2016. [ pdf ] [ Citations ]
Abstract
This paper presents an overview of an intelligence platform we have built to address threat hunting and incident investigation use-cases in the cyber security domain. Specifically, we focus on User and Entity Behavior Analytics (UEBA) modules that track and monitor behaviors of users, IP addresses and devices in an enterprise. Anomalous behavior is automatically detected using machine learning algorithms based on Singular Values Decomposition (SVD). Such anomalous behavior indicative of potentially malicious activity is alerted to analysts with relevant contextual information for further investigation and action. We provide a detailed description of the models, algorithms and implementation underlying the module and demonstrate the functionality with empirical examples.
- Collective Spammer Detection in Evolving Multi-Relational Social Networks.
S Fakhraei, J Foulds, M Shashanka, L Getoor
21st ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD), Sidney, Australia, Aug 2015. [ pdf ] [ Citations ]
Abstract
Detecting unsolicited content and the spammers who create it is a long-standing challenge that affects all of us on a daily basis. The recent growth of richly-structured social net- works has provided new challenges and opportunities in the spam detection landscape. Motivated by the Tagged.com social network, we develop methods to identify spammers in evolving multi-relational social networks. We model a social network as a time-stamped multi-relational graph where vertices represent users, and edges represent different ac- tivities between them. To identify spammer accounts, our approach makes use of structural features, sequence mod- elling, and collective reasoning. We leverage relational se- quence information using k-gram features and probabilistic modelling with a mixture of Markov models. In order to per- form collective reasoning and improve the predictive power of a noisy abuse reporting system, we develop a statistical relational model using hinge-loss Markov random fields (HL- MRFs), a class of probabilistic graphical models which are highly scalable. We use Graphlab Create and Probabilistic Soft Logic (PSL) to prototype and experimentally evaluate our solutions on internet-scale data from Tagged.com. Our experiments demonstrate the effectiveness of our approach, and show that models which incorporate the multi-relational nature of the social network significantly gain predictive performance over those that do not.
- Maximally Bijective Discretization for Data-Driven Modeling of Complex Systems.
S Sarkar, A Srivastav, M Shashanka
2013 American Control Conference, Washington, DC, Jun 2013. [ pdf ] [ Citations ]
Abstract
Phase-space discretization is a necessary step for study of continuous dynamical systems using a language-theoretic approach. It is also critical for many machine learning techniques, e.g., probabilistic graphical models (Bayesian Networks, Markov models). This paper proposes a novel discretization method – Maximally Bijective Discretization, that finds a discretization on the dependent variables given a discretization on the independent variables such that the correspondence between input and output variables in the continuous domain is preserved in discrete domain for the given dynamical system.
- An Integrated Infrastructure for Real-Time Building Energy Modeling and Fault Detection and Diagnostics.
B Dong, Z O'Neill, Z Li, D Luo, M Shashanka, S Ahuja, T Bailey
SimBuild 2012, Madison, WI, Aug 2012. [ pdf ] [ Citations ]
- Simplex Decompositions using SVD and PLSA.
M Shashanka, MJ Giering
Intl. Conf on Pattern Recognition Applications and Methods, Vilamoura, Portugal, Feb 2012. [ pdf ] [ Citations ]
Abstract
Probabilistic Latent Semantic Analysis (PLSA) is a popular technique to analyze non-negative data where multinomial distributions underlying every data vector are expressed as linear combinations of a set of basis distributions. These learned basis distributions that characterize the dataset lie on the standard simplex and themselves represent corners of a simplex within which all data approximations lie. In this paper, we describe a novel method to extend the PLSA decomposition where the bases are not constrained to lie on hte standard simplex and thus are better able to characterize the data. The locations of PLSA basis distributions on the standard simplex depend on how the dataset is aligned with respect to the standard simplex. If the directions of maximum variance of the dataset are orthogonal to the standard simplex, then the PLSA bases will give a poor representation of the dataset. Out approach overcomes this drawback by utilizing Singular Values Decomposition (SVD) to identify the directions of maximum variane, and transforming the dataset to align these directions paralle to the standard simplex before performing PLSA. These learned PLSA features are then transformed back into the data space. The effectiveness of the proposed approach is demonstrated with experiments on synthetic data.
- Copula Functions for Learning Multimodal Densities with Non-linear Dependencies.
A Tewari, M Shashanka, M Giering
NIPS Workshop on Copulas in Machine Learning, Sierra Nevada, Spain, Dec 2011. [ pdf ] [ Poster ] [ Citations ]
- Real Time Model-Based Energy Diagnostics in Buildings.
Z O'Neill, M Shashanka, X Pang, P Bhattacharya, T Bailey, P Haves
12th Conf. on Intl. Building Perf. Sim. Assoc., Sydney, Australia, Nov 2011. [ pdf ] [ Citations ]
- A Fast Algorithm for Discrete HMM Training using Observed Transitions.
M Shashanka
IEEE Intl. Conf on Acoustics, Speech and Signal Processing, Prague, Czech Republic, May 2011. [ pdf ] [ Citations ]
Abstract
We present a new algorithm to estimate the parameters of a Hidden Markov Model (HMM), specifically the transition probability matrix of the hidden states and the emission probabilities, given an observed sequence of data. The algorithm uses the number of transitions present in the observed label sequence and computes parameters in an iterative fashion. We present experiments that demonstrate significant speed gains obtained by the current algorithm as compared to traditional algorithms such as Baum-Welch iterations.
- A Privacy Preserving Framework for Gaussian Mixture Models.
M Shashanka
IEEE Intl. Workshop on Privacy Aspects of Data Mining, Sydney, Australia, Dec 2010. [ pdf ] [ Citations ]
Abstract
This paper presents a framework for privacy-preserving Gaussian Mixture Model computations. Specifically, we consider a scenario where a central service wants to learn the parameters of a Gaussian Mixture Model from private data distributed among multiple parties with privacy constraints. In addition, the service also has security constraints where none of the data owners are allowed to learn the values of the trained parameters. We use Secure Multiparty Computations to propose a framework that allows such computations.In addition, we also show how such a central service can classify new test data from privacy constrained third parties without exposing the learned models. The classification occurs with the added constraint that the service learns no information about either the test data or the result of the classification.
- Probabilistic Latent Component Analysis for Gearbox Vibration Source Separation.
J Isom, M Shashanka, A Tewari, A Lazarevic
Annual Conference of the PHM Society, Portland, Oregon, Oct 2010. [ pdf ] [ Citations ]
Abstract
Probabilistic Latent Component Analysis (PLCA) is applied to the problem of gearbox vibration source separation. A model for the probability distribution of gearbox vibration employs a latent variable intended to correspond to a particular vibration source, with the measured vibration at a particular sensor for each source the product of a marginal distribution of vibration by frequency, a marginal distribution of vibration by shaft rotation, and a sensor weight distribution. An expectation-maximization algorithm is used to approximate a maximum-likelihood parametrization for the model. In contrast to other unsupervised source separation methods, PLCA allows for separation of vibration sources when there are fewer vibration sensors than vibration sources. Once the vibration components of a healthy gearbox have been identified, the vibration characteristics of damaged gearbox elements can be determined. The efficacy of the technique is demonstrated with an application on a gearbox vibration data set.
- Topic Models for Audio Mixture Analysis.
P Smaragdis, M Shashanka, B Raj
NIPS Workshop on Applications for Topic Models: Text and Beyond, Vancouver, Canada, Dec 2009. [ pdf ] [ Citations ]
- A Sparse Non-parametric Approach for Single Channel Separation of Known Sounds.
P Smaragdis, M Shashanka, B Raj
Neural Information Processing Systems Conference (NIPS), Vancouver, Canada, Dec 2009. [ pdf ] [ Citations ]
Abstract
In this paper we present an algorithm for separating mixed sounds from a monophonic recording. Our approach makes use of training data which allows us to learn representations of the types of sounds that compose the mixture. In contrast to popular methods that attempt to extract com- pact generalizable models for each sound from training data, we employ the training data itself as a representation of the sources in the mixture. We show that mixtures of known sounds can be described as sparse com- binations of the training data itself, and in doing so produce significantly better separation results as compared to similar systems based on compact statistical models.
- Simplex Decompositions for Real-valued Datasets.
M Shashanka.
IEEE Intl Workshop on Machine Learning and Signal Processing, Grenoble, France, Sep 2009. [ DOI ] [ pdf ] [ code ] [ Citations ]
Abstract
In this paper, we introduce the concept of Simplex Decompositions and present a new semi-nonnegative decomposition technique that works with real-valued datasets. The motivation stems from the limitations of topic models such as Probabilistic Latent Semantic Analysis (PLSA), that have found wide use in the analysis of non-negative data apart from text corpora such as images, audio spectra, gene array data among others. The goal of this paper is to remove the non-negativity requirement for datasets so that these models can work on datasets with both positive and negative entries. We start by showing that PLSA is equivalent to finding a set of components that define the corners of a simplex within which all datapoints lie. We formalize this intuition by introducing the notion of simplex decompositions - PLSA and extensions are specific examples - and generalize the idea to be applicable to arbitrary real datasets with both positive and negative entries. We present algorithms and illustrate the method with examples.
- Missing Data Imputation for Spectral Audio Signals.
P Smaragdis, B Raj, M Shashanka.
IEEE Intl Workshop on Machine Learning and Signal Processing, Grenoble, France, Sep 2009. [ DOI ] [ pdf ] [ Citations ]
Abstract
With the recent attention to audio processing in the time-frequency domain, we increasingly encounter the problem of missing data. In this paper, we present an approach that allows for imputing missing values in the time-frequency domain of audio signals. The presented approach is able to deal with real-world polyphonic signals by performing imputation even in the presence of complex mixtures. We show that this approach outperforms generic imputation approaches, and we present a variety of situations that highlight its utility.
- Mining Retail Transaction Data for Targeting Customers with Headroom - A Case Study.
M Shashanka, M Giering.
Artificial Intelligence Applications and Innovations, Greece, April 2009. [ DOI ] [ pdf ] [ Citations ]
Abstract
We outline a method to model customer behavior from retail transaction data. In particular, we focus on the problem of recommending relevant products to consumers. Addressing this problem of filling holes in the baskets of consumers is a fundamental aspect for the success of targeted promotion programs. Another important aspect is the identification of customers who are most likely to spend significantly and whose potential spending ability is not being fully realized. We discuss how to identify such customers with headroom and describe how relevant product categories can be recommended. The data consisted of individual transactions collected over a span of 16 months from a leading retail chain. The method is based on Singular Value Decomposition and can generate significant value for retailers.
- Probabilistic Factorization of Non-Negative Data with Entropic Co-occurrence Constraints.
P Smaragdis, M Shashanka, B Raj, GJ Mysore.
Intl Conf on Independent Component Analysis, Brazil, March 2009. [ DOI ] [ pdf ] [ Citations ]
Abstract
In this paper we present a probabilistic algorithm which factorizes non-negative data. We employ entropic priors to additionally satisfy that user specified pairs of factors in this model will have their cross entropy maximized or minimized. These priors allow us to construct factorization algorithms that result in maximally statistically different factors, something that generic non-negative factorization algorithms cannot explicitly guarantee. We further show how this approach can be used to discover clusters of factors which allow a richer description of data while still effectively performing a low rank analysis.
- Sparse and Shift-Invariant Feature Extraction from Non-Negative Data.
P Smaragdis, B Raj, M Shashanka.
IEEE Intl Conf on Acoustics, Speech and Signal Processing, Las Vegas, Nevada, Apr 2008. [ DOI ] [ pdf ] [ Citations ]
Abstract
In this paper we describe a technique that allows the extraction of multiple local shift-invariant features from analysis of non-negative data of arbitrary dimensionality. Our approach employs a probabilistic latent variable model with sparsity constraints. We demonstrate its utility by performing feature extraction in a variety of domains ranging from audio to images and video.
- Sparse Overcomplete Latent Variable Decomposition of Counts Data. [ Fig.1 Data ]
M Shashanka, B Raj, P Smaragdis.
Neural Information Processing Systems Conference (NIPS), Vancouver, Canada, Dec 2007. [ pdf ] [ supplement ] [ Citations ]
Abstract
An important problem in many fields is the analysis of counts data to extract meaningful latent components. Methods like Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) have been proposed for this purpose. However, they are limited in the number of components they can extract and lack an explicit provision to control the expressiveness of the extracted components. In this paper, we present a learning formulation to address these limitations by employing the notion of sparsity. We start with the PLSA framework and use an entropic prior in a maximum a posteriori formulation to enforce sparsity.We show that this allows the extraction of overcomplete sets of latent components which better characterize the data. We present experimental evidence of the utility of such representations.
- Privacy-Preserving Musical Database Matching.
M Shashanka, P Smaragdis.
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, Oct 2007. [ DOI ] [ pdf ] [ Citations ]
Abstract
In this paper we present an illustratory process which allows privacy-preserving transactions in the context of musical databases. In particular we address the problem of matching a piece of music audio to a service database in such a way such that the database provider will not directly observe the query, nor its result, thereby preserving the privacy of the inquirer. We formulate this process within the field of secure multiparty computation and show how such a transaction can be achieved once we derive secure versions of basic signal processing operations.
- Supervised and Semi-Supervised Separation of Sounds from Single-Channel Mixtures.
P Smaragdis, B Raj, M Shashanka.
Intl Conf on Independent Component Analysis, London, UK, Sep 2007. [ DOI ] [ pdf ] [ Citations ]
Abstract
In this paper we describe a methodology for model-based single channel separation of sounds. We present a sparse latent variable model that can learn sounds based on their distribution of time/frequency energy. This model can then be used to extract known types of sounds from mixtures in two scenarios. One being the case where all sound types in the mixture are known, and the other being being the case where only the target or the interference models are known. The model we propose has close ties to non-negative decompositions and latent variable models commonly used for semantic analysis.
- Sparse Overcomplete Decomposition for Single Channel Speaker Separation. [ Examples ]
M Shashanka, B Raj, P Smaragdis.
IEEE Intl Conf on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, April 2007. [ DOI ] [ pdf ] [ Citations ]
Abstract
We present an algorithm for separating multiple speakers from a mixed single channel recording. The algorithm is based on a model proposed by Raj and Smaragdis (2005). The idea is to extract certain characteristic spectra-temporal basis functions from training data for individual speakers and decompose the mixed signals as linear combinations of these learned bases. In other words, their model extracts a compact code of basis functions that can explain the space spanned by spectral vectors of a speaker. In our model, we generate a sparse-distributed code where we have more basis functions than the dimensionality of the space. We propose a probabilistic framework to achieve sparsity. Experiments show that the resulting sparse code better captures the structure in data and hence leads to better separation.
- A Framework for Secure Speech Recognition.
P Smaragdis, M Shashanka.
IEEE Intl Conf on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, April 2007. [ DOI ] [ pdf ]
Abstract
We present an algorithm that enables privacy-preserving speech recognition transactions between multiple parties. We assume two commonplace scenarios. One being the case where one of two parties has private speech data to be transcribed and the other party has private models for speech recognition. And the other being that of one party having a speech model to be trained using private data of multiple other parties. In both of the above cases data privacy is desired from both the data and the model owners. In this paper we will show how such collaborations can be performed while ensuring no private data leaks using secure multiparty computations. In neither case will any party obtain information on other parties data. The protocols described herein can be used to construct rudimentary speech recognition systems and can be easily extended for arbitrary audio and speech processing.
- Bandwidth Expansion with a Polya Urn Model. [ Examples ]
B Raj, R Singh, M Shashanka, P Smaragdis.
IEEE Intl Conf on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, April 2007. [ DOI ] [ pdf ] [ Citations ]
Abstract
We present a new statistical technique for the estimation of the high frequency components (4-8 kHz) of speech signals from narrow-band (0-4 kHz) signals. The magnitude spectra of broadband speech are modelled as the outcome of a Polya Urn process, that represents the spectra as the histogram of the outcome of several draws from a mixture multinomial distribution over frequency indices. The multinomial distributions that compose this process are learnt from a corpus of broadband (0-8 kHz) speech. To estimate high-frequency components of narrow-band speech, its spectra are also modelled as the outcome of draws from a mixture-multinomial process that is composed of the learnt multinomials, where the counts of the indices of higher frequencies have been obscured. The obscured high-frequency components are then estimated as the expected number of draws of their indices from the mixture-multinomial. Experiments conducted on bandlimited signals derived from the WSJ corpus show that the proposed procedure is able to accurately estimate the high frequency components of these signals.
- Separating a Foreground Singer from Background Music. [ Examples ]
B Raj, P Smaragdis, M Shashanka, R Singh.
Intl Symposium on Frontiers of Research on Speech and Music (FRSM), Mysore, India, Jan 2007. [ pdf ] [ Citations ]
Abstract
In this paper we present a algorithm for separating singing voices from background music in popular songs. The algorithm is derived by modelling the magnitude spectrogram of audio signals as the outcome of draws from a discrete bi-variate random process that generates time-frequency pairs. The spectrogram of a song is assumed to have been obtained through draws from the distributions underlying the music and the vocals, respectively. The parameters of the underlying distribuiton are learnt from the observed spectrogram of the song. The spectrogram of the separated vocals is then derived by estimating the fraction of draws that were obtained from its distribution. In the paper we present the algorithm within a framework that allows personalization of popular songs, by separating out the vocals, processing them appropriately to oneâs own tastes, and remixing them. Our experiments reveal that we are effectively able to separate out the vocals in a song and personalize them to our tastes.
- A Probabilistic Latent Variable Model for Acoustic Modeling.
P Smaragdis, B Raj, M Shashanka.
Workshop on Advances in Models for Acoustic Processing, NIPS 2006. [ pdf ] [ Citations ]
Abstract
In this paper we describe a model developed for the analysis of acoustic spectra. Unlike decompositions techniques that can result in difficult to interpret results this model explicitly models spectra as distributions and extracts sets of additive and semantically useful components that facilitate a variety of applications ranging from source separation, denoising, music transcription and sound recognition. This model is probabilistic in nature and is easily extended to produce sparse codes, and discover transform invariant components which can be optimized for particular applications.
- Secure Sound Classification: Gaussian Mixture Models.*
M Shashanka, P Smaragdis.
IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Toulouse, France, May 2006. [ DOI ] [ pdf ] [ Citations ]
* Finalist in the Student Paper Contest.
Abstract
We propose secure protocols for Gaussian mixture-based sound recognition. The protocols we describe allow varying levels of security between two collaborating parties. The case we examine consists of one party (Alice) providing data and other party (Bob) providing a recognition algorithm. We show that it is possible to have Bob apply his algorithm on Alice's data in such a way that the data and the recognition results will not be revealed to Bob thereby guaranteeing Alice's data privacy. Likewise we show that it is possible to organize the collaboration so that a reverse engineering of Bob's recognition algorithm cannot be performed by Alice. We show how Gaussian mixtures can be implemented in a secure manner using secure computation primitives implementing simple numerical operations and we demonstrate the process by showing how it can yield identical results to a non-secure computation while maintaining privacy.
- Latent Dirichlet Decomposition for Single Channel Speaker Separation. [ Examples ]
B Raj, M Shashanka, P Smaragdis.
IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Toulouse, France, May 2006. [ DOI ] [ pdf ] [ Citations ]
Abstract
We present an algorithm for the separation of multiple speakers from mixed single-channel recordings by latent variable decomposition of the speech spectrogram. We model each magnitude spectral vector in the short-time Fourier transform of a speech signal as the outcome of a discrete random process that generates frequency bin indices. The distribution of the process is modeled as a mixture of multinomial distributions, such that the mixture weights of the component multinomials vary from analysis window to analysis window. The component multinomials are assumed to be speaker specific and are learned from training signals for each speaker. We model the prior distribution of the mixture weights for each speaker as a Dirichlet distribution. The distributions representing magnitude spectral vectors for the mixed signal are decomposed into mixtures of the multinomials for all component speakers. The frequency distribution, i.e the spectrum for each speaker, is reconstructed from this decomposition.
- Optimal Multi-Channel Data Allocation with Flat Broadcast per Channel.*
AA Bertossi, MC Pinotti, S Ramaprasad, R Rizzi, M Shashanka.
Intl. Parallel and Distributed Processing Symposium, Santa Fe, USA, Apr 2004. [ DOI ] [ pdf ] [ Citations ]
*Authors listed in alphabetical order.
Abstract
Broadcast is an efficient and scalable way of transmitting data to an unlimited number of clients that are listening to a channel. Cyclically broadcasting data over the channel is a basic scheduling technique, which is known as flat scheduling. When multiple channels are available, partitioning data among channels in an unbalanced way, depending on data popularities, is an allocation technique known as skewed allocation. In this paper, the problem of data broadcasting over multiple channels is considered assuming skewed data allocation to channels and fiat data scheduling per channel, with the objective of minimizing the average waiting time of the clients. Several algorithms, based on dynamic programming, are presented which provide optimal solutions for N data items and K channels. Specifically, for data items with uniform lengths, an O(NKlogN) time algorithm is proposed, which improves over the previously known O(N/sup 2/K) time algorithm. When K /spl les/ 4, faster O(N) time algorithms are exhibited. Moreover, for data items with nonuniform lengths, it is shown that the problem is NP-hard when K = 2, and strong NP-hard for arbitrary K. In the former case, a pseudo-polynomial algorithm is discussed, whose time is O(NZ) where Z is the sum of the data lengths.
- A Characterisation of Optimal Channel Assignments for Wireless Networks Modelled as Cellular and Square Grids.
M Shashanka, A Pati, AM Shende.
Intl. Parallel and Distributed Processing Symposium, Nice, France, Apr 2003. [ DOI ] [ pdf ] [ Citations ]
Abstract
In this paper we first present a uniformity property that characterises optimal channel assignments for networks arranged as cellular or square grids. Then, we present optimal channel assignments for cellular and square grids; these assignments exhibit a high value for /spl delta//sub 1/ - the separation between channels assigned to adjacent stations. Based on empirical evidence, we conjecture that the value our assignments exhibit is an upper bound on /spl delta//sub 1/.
- Channel Assignment for Wireless Networks Modelled as d-Dimensional Square Grids.
A Dubhashi, M Shashanka, A Pati, S Ramaprasad, AM Shende.
Intl. Workshop on Distributed Computing, Kolkata, India, Dec 2002. [ DOI ] [ pdf ] [ Citations ]
Abstract
In this paper, we study the problem of channel assignment for wireless networks modelled as d-dimensional grids. In particular, for d-dimensional square grids, we present optimal assignments that achieve a channel separation of 2 for adjacent stations where the reuse distance is 3 or 4. We also introduce the notion of a colouring schema for d-dimensional square grids, and present an algorithm that assigns colours to the vertices of the grid satisfying the schema constraints.