Liad's little blog

Posts

Natural Language Processing :: A beginner’s (brief) guide for working on NLP problems

February 12, 2018

This article was written for the Data Science Society's Datathon held on February 2018, as an answer to questions from the participants who were new to NLP. The Data Science Society's Datathon 2018 , presented us this time many cases which are Natural Language Processing related . One of the cases, for example, involves extracting entities' activities from unstructured documents, and determining their sentiment. So, how should one begin working on such a problem? First, let’s break this problem down: (1) we need to to detect which entities are mentioned in an article, and then (2) we need to detect the sentiment that is related, specifically to those entities. Let’s think about it logically for a moment: Entities are (normally) nouns, and in order to get the sentiment we will probably need to look at the adjectives that describe them, or the verbs that are related to them. So a first step could be (a) parsing the document into sentences, and then int...

NLP Toolkit (Made for self use - but feel free to use it too :) )

February 03, 2018

A list of tools for NLP tasks. Done mostly for self-reference... hence quite brief. Better lists can be found out there: https://github.com/keon/awesome-nlp SciPy ( sklearn.feature_extraction.text ) CountVectorizer - converting text to token-counts matrix (n-grams co-reference) which is then used by: TfidfTransformer - transforms a count-matrix ( CountVectorizer output) to term-frequency or inverse document frequency (TF-IDF) NLTK Corpus reader - Words tokenization POS Chunking Stemming Creating parse trees out of sentences Using KnowledgeBases (FrameNet, WordNet, propBank) Different implementation for tagging (Senna, Stanford) Spacy Corpus reader - Tokenization NER, POS Semantic representation (Word Vectors) Labeled dependency parsing Gensim : Corpus reader / parser transofrmations (TF-IDF, LSA, LDA, HDP) Similarity Queries Topic segmentation (LDA, LSA) AllenNLP High-level trained models Machine comprehension (QA based on given t...

pip install pymssql fails with 'sqlfront.h': No such file or directory

July 21, 2017

I've tried to install pymssql on Windows using command line: pip install pymssql The operation fails with an error: fatal error C1083: Cannot open include file: 'sqlfront.h': No such file or directory while looking for a solution, the first results are not really helpful, until you arrive to this helpful one . In short: I was using python version 3.6, which currently isn't supported with pymssql. Using python 3.5 instead solves that issue, and the pip install pymssql runs well. If you're using Anaconda Navigator , it is made even simpler: In the Environment section, you can add a new environment with your choice of python version. When you're choosing a supported version - 3.5 for example - you will find the pymssql in the packages list and install it from there directly.

AngularJS Directive: Accessing an DOM element with a dynamic ID in an asynchronous directive

January 09, 2015

Hi, This problem was really bothering me for few months. In earlier cases I tried to avoid it, but today I could not hide from it any longer. And the thing is that there was no post on any other place that could solve this matter... So here it goes. You're writing a directive in AngularJS , and you want to access one of the DOM elements declared within its template. Usually you do it through the linking function (post section in the compile) or in the directive controller, by injecting and using '$element', as explained here . But, in my case, I had to access an element using its ID and this ID was dynamic. So, my template contains: < div > < div ng-attr-id ="{{ 'something_' + dynamicValue }}"></ div > </ div > And my directive contains: { templateURL: 'pathToTempalte' , scope: { dynamicValue: '...

A guide for modeling a graph database - A lunch with Neo4J chief scientist Jim Webber, London

January 18, 2013

Since the invention of NOSQL databases, it gets more and more attention from the developer community. One of the remarkable and unique databases exists in this topic of NOSQL is the graph database of NEO4J, an open-source database which stores the data as a Graph. Modeling a graph database is quite different than modeling the regular RDBMS database, and even from other NOSQL databases such as key-value collections. We got used to identifying the molecularity of the data and save it as the columns, joining similar data together into tables. But since Neo4J is a flexible graph database, this case does not work there. In order to model the data, we need to identify first the queries that we're about to run on the database. These queries will form the basic logical sentences which are the keys of modeling the database (more about that coming up in a second). We need to identify where to store each piece of information: - as a node - as an edge - as a property of a node or of ...

Javascript, animation and easing functions

August 06, 2012

I recently had to create an animation, which obey to the laws of physics, using javascript and HTML5. Among the gravity and free falling, one of these animations had to also apply sort of bouncing and elastic laws. After creating it first with CSS3 and HTML, we realized that most of the Android devices does not support it fully yet. Especially versions prior to Ice Cream Sandwich, such as Android 2.1, 2.2, 2.3 and lower (Froyo and its siblings). So, I had to transfer everything into HTML5 Canvas, and instead of applying the css-transitions, I had to implements everything alone. The first website I encountered and would like to highly recommend is Timothee Groleau's easing generator which can be found here : http://timotheegroleau.com/Flash/experiments/easing_function_generator.htm Originally made for Flash - but can easily adopted to Javascript HTML5 too. The output of his generator is basically a function describes a certain movement of an object. Such...