This paper proposes the similarity-based deduplication approach to evict similar duplicates from the archive storage, which compares the samples of binary hashes to identify the duplicates. In multimedia content, every file does not necessarily get tagged as an exact duplicate rather they are prone to editing and resulting in similar copies of the same file. The traditional storage approaches are being challenged by huge data volumes. The simulations show that the good performance with a sustainable computational cost and obtained the average accuracy rate more than 90%. For the speaker identification and audio query by process, we estimate the similarity of the example signal and the samples in the queried database by calculating the Euclidian distance between the Mel frequency cepstral coefficients (MFCC) and Energy spectrum of acoustic features. The method consists of 2 main processing first steps, we separate vocal and non-vocal identification after that vocal be used to speaker identification for audio query by speaker voice. This paper introduces the use of language and text independent speech as input queries in a large sound database by using Speaker identification algorithm. Changing the presentation from the text data to multimedia data types make an information retrieval process more complex such as a retrieval of image or sounds in large databases. Typically, search engine can be based on full-text indexing. Search engine is the popular term for an information retrieval (IR) system. Keywords: Content-based audio retrieval Deep learning Deep neural networks Similarity-preserving hash Unsupervised learning This is an open access article under the CC BY-SA license. Experimental results on the Extended Ballroom dataset with 8 genres of 3,000 musical excerpts show that our proposed method significantly outperforms state-of-the-art data-independent method in both effectiveness and efficiency. The independence and balance properties are included and optimized in the objective function to improve the codes. Different from data-independent hashing methods, we develop a deep network to learn compact binary codes from multiple hierarchical layers of nonlinear and linear transformations such that the similarity between samples is preserved. In this paper, an unsupervised similarity-preserving hashing method for content-based audio retrieval is proposed. Hence, the binary codes do not preserve the similarity and may degrade the search performance. However, most existing hashing-based methods focus on data-independent scheme where random linear projections or some arithmetic expression are used to construct hash functions. Due to its efficiency in storage and search speed, binary hashing has become an attractive approach for a large audio database search.
0 Comments
Leave a Reply. |