Moroccan Interior Ministry Announces Merit Promotions for 7,000 Officers in 2018

Rabat – Some 7,092 police officers of different corps and ranks received merit-based promotions for the 2018 financial year, announced the General Directorate of National Security (DGSN) on Tuesday. They represent 44 percent of the overall officers registered for the promotion.This includes 5,208 uniformed and 1,820 plainclothes officials, as well as 64 managers and inter-ministerial officials, said the DGSN.The Promotion Committee has worked to increase the number of recipients in the lower ranks and echelons, from peacekeepers to senior officers, totalling 6,406 civil servants and accounting for over 90 percent of the recipients.Unlike other ways of being promoted such as competition or examination, those who receive promotions based on merit are selected by the administration. The choice is based on the need to recognize competent public officials who could not pursue further education or take part in competitions and examinations.Around 8,644 police officers from different corps and grades were promoted last year and 6,067 in 2016. read more

Continue reading


Getting youth active for life

Making it to an Olympic podium takes years of dedication, special training and a certain level of skill.Jumping on your bike to head to the store doesn’t require the same type of expertise. Just an interest in being active. But developing that interest relies on commitment during the early years.Maria Hayes reports.00:00:00 | 00:00:00::Projekktor V1.3.09

Continue reading


UN humanitarian office asks help for Congolese women expelled from Angola

OCHA had previously issued a special appeal for help for some 80,000 Congolese, some born during their parents’ exile in neighbouring Angola, who were expelled during the Angolan sweep of its diamond mines for illegal workers. Today it called for boosting the capacity of health partners already working in the area and the financing of new partners with expertise in sexual violence and the prevention and transmission of HIV/AIDS. According to OCHA, it received reports that Angolan military agents “sexually abused women and girls under the pretext of searching for hidden diamonds among Congolese being expelled from Angola. “In addition to the psychological trauma caused, (the) risk of HIV/AIDS infection and other sexually transmissible diseases is high as military agents are reportedly using unsanitary methods for internal body cavity searches of both men and women,” it said. The Congolese were being expelled from the Angolan provinces of Malange, Lunda Norte, Lunda Sul and Kwanza Sul since December, the Office said. To reach the border, they were forced to walk for days, leaving the weak and the young behind to their fate. A shipment of aid, including blankets, generators, boats and jerry cans, arrived in the DRC capital, Kinshasa, on Sunday and was to be distributed by the UN Children’s Fund (UNICEF), OCHA said, adding that it also requested more help with truck transport. read more

Continue reading


Google introduces Chromebooks – computers built to use the web – to

by Michael Oliveira, The Canadian Press Posted Mar 19, 2013 1:00 am MDT Google introduces Chromebooks – computers built to use the web – to Canada AddThis Sharing ButtonsShare to TwitterTwitterShare to FacebookFacebookShare to RedditRedditShare to 電子郵件Email TORONTO – For consumers who find they really only use a computer to get online and not much else, Google now has the Chromebook, a brand of Internet-only computers officially introduced in Canada on Tuesday.Chromebooks look like standard laptops but don’t come loaded with a version of Windows or a Mac operating system. They run on Google’s Chrome OS, a streamlined platform with the web browser as the main attraction.They’re being pushed as a low-cost device — they start at about $250 — for users who spend most of their time on a computer using the Internet, and therefore don’t need the processing power to run full-blown software applications. They can connect to the web via WiFi or mobile networks.Stripping away the convoluted operating system and setting things up so users can jump right into the web makes the experience simpler and faster, said Google’s Caesar Sengupta, product management director.“The whole idea is basically to have a computing experience that’s extremely simple, that’s very stable, very secure, and sort of just gets out of your way,” Sengupta said.Chromebooks turn on quickly like mobile devices — although they still take a few seconds to boot up after being shut down. There’s a store with a selection of simple programs and games to access, and web-based applications — such as the Google Docs suite to create documents and spreadsheets — run smoothly within the Chrome browser and store files online. Consumers who have been frustrated with the limitations of tablets, and miss a keyboard, may find Chromebooks are more suitable substitutes for a full-blown laptop. Asus, HP and Samsung all plan to release Chromebooks in Canada.While some users will be uncomfortable with the concept of an Internet-only machine — although you can use some apps while offline — Google saw a growing demand for it.“Users today, particularly the younger generation, are very web savvy … the current generation of people or kids are very used to having stuff in the cloud, they prefer that model,” said Sengupta.“For many people they won’t move completely (to the Chromebook concept) but it’s a fantastic second computer…. But from our point of view we absolutely feel this is where modern-day usage is heading, this is where users are heading, so that’s what we’re building towards.” read more

Continue reading


Mens Basketball Keita BatesDiop reportedly forgoes redshirt senior season to enter 2018

Ohio State redshirt junior forward Keita Bates-Diop (33) looks to drive in the first half the game against Penn State in the Big Ten tournament quarterfinals on Mar. 2 in Madison Square Garden. Ohio State lost 68-69. Credit: Jack Westerheide | Photo EditorThe Ohio State men’s basketball team will need to find a way to replace a large void of production in its starting five for the 2018-19 season.Redshirt junior forward Keita Bates-Diop will not return to Columbus for his final season of collegiate eligibility, electing to enter the NBA draft on Monday, according to ESPN’s Adrian Wojnarowski.After missing a majority of the 2016-17 season due to an injury, Bates-Diop returned to the starting five to begin the season and emerged as a star for the Buckeyes, averaging 19.8 points, 8.7 rebounds, 1.6 assists and 1.6 blocks per game. The breakout campaign netted him 2018 Big Ten Player of the Year honors, as well as first-team all-conference by both the media and coaches.In Ohio State’s two games in the NCAA Tournament, Bates-Diop averaged 26 points and 7.5 rebounds per game, scoring 24 points in the first-round matchup against South Dakota State and 28 points in the second round against Gonzaga. He shot 17-for-40 overall, including 8-of-22 from 3.Head coach Chris Holtmann led the Buckeyes to a surprising season few expected in his first year at the helm of the team. Now following Bates-Diop’s exit from the team, Holtmann will have several key positions to fill. The team saw both starting shooting guard Kam Williams and another forward Jae’Sean Tate spend their final years of eligibility.It is expected that given the loss of both forwards, freshman and former four-star prospect Kyle Young will be counted on to produce in at least one of the two starting positions. In a limited role during his first season, Young averaged just 1.8 points, 1.6 rebounds and 0.2 assists per game with 8.6 minutes per game. Other candidates to fill out the starting spots at forward will be sophomore Andre Wesson and incoming freshmen Jaedon LeDee and Justin Ahrens.Bates-Diop is expected to hold a press conference at Ohio State later Monday. read more

Continue reading


RTÉ now lists all candidates in a constituency whenever one is mentioned

first_imgRTÉ now lists all candidates in a constituency whenever one is mentioned on air How does RTÉ achieve balance with its election coverage? The ‘stopwatch’ approach and listing all its candidates. By Gráinne Ní Aodha 19,490 Views 15 Comments For example, last Sunday The Week in Politics had a live studio panel discussion with five candidates in one constituency, broadcast pre-recorded contributions from other candidates and listed the remaining candidates and their main issues within the programme for the audience. However, broadcasters must be in a position to demonstrate how any mechanisms have ensured fairness, objectivity and impartiality in instances where complaints are received directly by the broadcaster or referred to the BAI.“Other than in the context of a complaint or other compliance investigation, the BAI does not comment on the specific mechanisms that a broadcaster chooses to uses to ensure fairness, objectivity and impartiality in relation to election or referenda interests.” Section 5 of the BAI Election and Referenda Guidelines requires broadcasters to develop mechanisms in respect of their approach to election and/or referenda coverage that are open, transparent and fair to all interested parties, but the BAI does not set detailed requirements for those mechanisms. StopwatchRTÉ also uses a “stopwatch” or 50:50 time limit is where programme makers measure the airtime received during broadcasts by representatives of opposing sides of a debate.This was used during the Eighth Amendment referendum campaign where a debate featuring a pro-life and pro-choice candidate would aim to allocate both candidates with an equal amount of airtime.But the BAI has said previously that there’s been a misconception that this type of approach is necessary to adhere to its guidelines, and said that if it’s applied “rigorously that it doesn’t do justice to the programming, it’s too narrow”. RTÉ has said that it does use a “stopwatch” approach, but it’s “an editorial tool to guide our coverage, and is not the only factor we use”. Balanced coverage is determined by a number of factors, outlined in Section 5 of the Rule 27 BAI guidelines.The BAI said that its guidelines state that decisions in respect of editorial coverage of an election or referenda rests solely with broadcasters. Image: Eamonn Farrell Image: Eamonn Farrell Share10 Tweet Email When the BAI was asked if it had instructed RTÉ to list each candidate so that it adheres to its guidelines, a spokesperson said it “had not issued instructions to RTÉ regarding the listing of candidates in its election coverage”.  Short URL May 5th 2019, 6:01 AM RTÉ IS ENSURING it adheres to the broadcasting authority’s election rules by listing all candidates in a particular constituency whenever one is mentioned.This means that if RTÉ report on comments made by Peter Casey, then all other candidates in the constituency he’s running in – the Midlands North-West – must also be listed.Since official election campaigning began on 24 April, RTÉ is must adhere to airtime coverage guidelines, set out by the Broadcasting Authority of Ireland (BAI). Among those rules are:“In terms of airtime allocated, broadcasters must do so in a manner that is equitable and fair to all interests.”RTÉ said that listing candidates is just one way it achieves balanced coverage:“In our coverage on all platforms, we list all candidates if covering a given constituency. https://jrnl.ie/4617355 Sunday 5 May 2019, 6:00 AM Tweet thisShare on FacebookEmail this articlelast_img read more

Continue reading


Mikakos makes history

first_imgJenny Mikakos, Labor member for Victoria’s Northern Metropolitan region added a new chapter to the history books of Australian politics after being named in Daniel Andrews’ cabinet.The MP will now become the Minister for Families, Children and Youth Affairs, after the new government was sworn in at Government House.Ms Mikakos, who in opposition was Shadow Minister for Community Services, Shadow Minister for Children, and Shadow Minister for Seniors and Ageing, will be the first female minister of Greek heritage to have served in either state or federal parliaments.At Labor’s first caucus meeting since the election, the Victorian Premier-elect announced his cabinet is to include nine women and 13 men.Bill Papastergiadis, president of the Greek Orthodox Community of Melbourne paid tribute to Minister-elect Mikakos, telling Neos Kosmos that she had been “at the forefront on dealing with community matters.”“We wholeheartedly support her ascension to the ministry,” said Mr Papastergiadis, who added that he hoped the new Labor Government would appoint at least two MPs of Greek background to ministerial duties “given the Greek community’s relevance/vibrancy and proportional representation within the population in Victoria”Greece’s Ambassador to Australia Mr Haris Dafaranos said he was delighted to hear the news of Ms Mikakos’ inclusion in the Andrews Cabinet.“I salute her as a personality and it is with a great sense of pride that I acknowledge her place in history as the first female Australian minister of Greek heritage.“I have no doubt she will lead by example and do her absolute best for all Victorians and of course the Greek Australian community.”Read more in Neos Kosmos’ English edition on Saturday. Facebook Twitter: @NeosKosmos Instagramlast_img read more

Continue reading


Can Bitcoin Be Stopped

first_img In a classic case of bad news being dumped on a Friday night, last week the sorta fake sorta real money Bitcoin lost about 20 percent of its value following a rejection by the Security and Exchange Commission. Famous Social Network villains the Winklevoss twins had proposed the idea of a Bitcoin Trust that would have tracked the currency and rewarded investors. But the Securities and Exchange Commission rejected the idea because of just how unstable and unregulated the internet dough is.As a longtime Bitcoin skeptic, I was all ready to write a post about how Bitcoin was screwed. I may have even written some drafts over the weekend in preparation for the grave-dancing. But when I sat down at my desk and looked up the latest news, I saw that Bitcoin had pretty much recovered from its dramatic plunge in just two days. Economics are complicated, especially once you add in the internet’s nonsense, but if the SEC couldn’t stop Bitcoin, can anything?Bitcoin isn’t the only cryptocurrency, but the digital cash is by far the most prominent. And I get why internet money with a funny name sounds neat and futuristic. But totally unregulated global virtual currency not backed up by anything real or stabilized by market forces/government agencies brings back the worst, oldest problems of pure currency itself. Bitcoin’s unchecked instability allowed it to rapidly bounce back from this setback, but it also caused the seemingly fatal plunge in the first place. Violent value fluctuations are back! It’s like the Great Depression in a microcosm. Not too long ago folks were killing themselves after losing all of their money due to Bitcoin losing half of its value overnight. Bitcoin platforms have even taken money from their own users to recoup unbelievable losses. That’s some big bank-level evil.Despite its obvious problems, Bitcoin has built up a stable coalition of extreme anarcho-capitalists who are probably thrilled this attempt at regulation failed, typical internet libertarians, racecar drivers/ransomware victims, people who really need to buy drugs on the Deep Web before the Feds seize their assets, obnoxious ads. And folks like me who ironically enjoy the idea of a currency with lore. To this day, no one is exactly certain who dreamt up Bitcoin, and that’s fantastic. It’s the fiat currency of the Internet Gutter.So will Bitcoin ever go away? It doesn’t look like it. More merchants are accepting the online currency regardless of what the SEC thinks. Will it just become so unstable that it finally collapses from within? Or will it stick around until 2140 when the last Bitcoin is mined, and we’re left with a finite, hopefully, more controllable amount of the money. Personally, I can’t wait to use Bitcoin on my iPhone made of pure proprietary light to buy my monthly ration of Emperor Trump Water. Cops Raid Suspected Pot House, Find Cryptomining Operation InsteadBitcoin Bomb Threat Scam Disrupts Businesses Across US, Canada Stay on targetlast_img read more

Continue reading


WrightPatterson AFBs Neighbors Intent on Forging Partnerships

first_imgOne week after the Dayton Development Coalition and the state of Ohio accepted ADC’s Community Excellence Award, local leaders continued to reach out to Wright-Patterson Air Force Base to identify partnership opportunities that will either help it trim spending or enhance its services.“For the base, the conversation has been on how we can avoid costs or generate revenue,” Col. John Devillier, the 88th Air Base Wing and installation commander, said at a meeting with community representatives. “That means a new way of thinking about how we interact,” Devillier said, reported the Dayton Business Journal.“Sharing services and resources is a benefit to us all,” said Deb McDonnell, city manager of Fairborn.Beyond municipalities sharing services with Wright-Patterson to cut its costs, other community organizations are working more closely with the installation. A group of area hospitals and health care organizations formed a private enterprise to commercialize technology developed on the base. Meanwhile, major colleges and universities in Ohio are targeting Wright-Patterson for research, according to the story.“Every community in the region is impacted by Wright-Patt,” said Jeff Hoagland, president and CEO of the Dayton Development Coalition. “Beavercreek, Riverside, Fairborn, all of them are looking at ways to work together with the base,” Hoagland said. Dan Cohen AUTHORlast_img read more

Continue reading


Meghan Markle baby shower You wont believe who is coming to the

first_imgMeghan MarkleGetty ImagesDuchess Meghan Markle raised several eyes in February when she decided to go to the United States to have her baby shower. Fans from all around the world were excited for Meghan’s baby shower except for one person — Queen Elizabeth II’s former press spokesman, who has called Meghan’s baby shower “a bit over the top.” At the same time, Victoria Beckham is most likely to attend Meghan’s second baby shower.As per reports, Meghan Markle’s baby shower happened in New York City at The Mark hotel. Somewhere around 15 people were invited to attend the grand event including Jessica Mulroney Abigail Spencer, Gayle King, and others. Everyone close to Meghan was pleased with the party except for Queen Elizabeth II’s former spokesman Dickie Arbiter.Dickie Arbiter recently told to Us Weekly that in a general context, a Baby shower is an American thing and they do not do that in the United Kingdom. Going to the United States for it was a bit expensive.”Baby showers, it’s very much an American thing,” he said. “We don’t do it here in the U.K. It was a bit over the top in terms of expense and the way she got there.”Dickie Arbiter went on to address the rumors circulating if the Duchess of Sussex was having a second shower in the UK. As per Arbiter since Meghan is an American, she does things the American way. Meghan MarkleGetty ImagesIf you remember, there were several reports last week that stated that Meghan’s sister-in-law, Kate Middleton, will host a second baby shower for Meghan. It was revealed that the Royals were indeed planning some private baby-centric event in the United Kingdom which Kate Middleton would host. It was expected that her friends and relatives were being invited though it was not sure whether it’s a ‘total’ baby shower.In addition to this, Meghan Markle’s mother is most likely to attend the big day. As per a report by The Daily Mail, Doria is coming to the baby shower which will be a small gathering of five or six people. Besides this, there are speculations that Victoria Beckham would also attend Meghan’s baby shower.Meghan Markle and Prince Harry’s child is due in late April or early May.last_img read more

Continue reading


Email id of JU VC hacked

first_imgKolkata: The e-mail id of Jadavpur University vice-chancellor Suranjan Das has been hacked. The V-C, who is a resident of FE Block in Salt Lake, has lodged a complaint with Bidhannagar Cyber Crime Police station on Wednesday. A senior police officer of Bidhannagar Cyber Crime police station has informed that according to Das’s complaint, someone has created a profile on Hotmail with the same name as him, while his actual account is on Yahoo. The hacker has been sending frivolous e-mails, seeking personal and professional help on Das’s part, from some persons who are in the mail list of the V-C. Also Read – Heavy rain hits traffic, flightsWhen contacted, Das said, “I have a number of respected and renowned persons in my e-mail contact list. I had received calls from some of them on Tuesday night, who enquired whether I am sick and then asked whether I have sought any sort of financial help from them. I informed them immediately that I have sent no such mail and told them not to respond to such requests of financial help. I was relieved that no one had transferred any money and had called me first,” Das maintained.He expressed his apprehension that the hacker might have access to some very confidential e-mails that he has to deal with, being the head of a university. “The police have been very cooperative. They have taken all necessary steps in this regard and I am hopeful that they will soon nab the offender,” he said.last_img read more

Continue reading


Use TensorFlow and NLP to detect duplicate Quora questions Tutorial

first_imgdf = df.fillna(”)y = df.is_duplicate.valuesy = y.astype(‘float32’).reshape(-1, 1) To summarize, we built a model with the help of TensorFlow in order to detect duplicated questions from the Quora dataset. To know more about how to build and train your own deep learning models with TensorFlow confidently, do checkout this book TensorFlow Deep Learning Projects. Read Next: TensorFlow 1.9.0-rc0 release announced Implementing feedforward networks with TensorFlow How TFLearn makes building TensorFlow models easier This tutorial shows how to build an NLP project with TensorFlow that explicates the semantic similarity between sentences using the Quora dataset. It is based on the work of Abhishek Thakur, who originally developed a solution on the Keras package. This article is an excerpt from a book written by Luca Massaron, Alberto Boschetti, Alexey Grigorev, Abhishek Thakur, and Rajalingappaa Shanmugamani titled TensorFlow Deep Learning Projects. Presenting the dataset The data, made available for non-commercial purposes (https://www.quora.com/about/tos) in a Kaggle competition (https://www.kaggle.com/c/quora-question-pairs) and on Quora’s blog (https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs), consists of 404,351 question pairs with 255,045 negative samples (non-duplicates) and 149,306 positive samples (duplicates). There are approximately 40% positive samples, a slight imbalance that won’t need particular corrections. Actually, as reported on the Quora blog, given their original sampling strategy, the number of duplicated examples in the dataset was much higher than the non-duplicated ones. In order to set up a more balanced dataset, the negative examples were upsampled by using pairs of related questions, that is, questions about the same topic that are actually not similar. Before starting work on this project, you can simply directly download the data, which is about 55 MB, from its Amazon S3 repository at this link into our working directory. After loading it, we can start diving directly into the data by picking some example rows and examining them. The following diagram shows an actual snapshot of the few first rows from the dataset: Exploring further into the data, we can find some examples of question pairs that mean the same thing, that is, duplicates, as follows: How does Quora quickly mark questions as needing improvement? Why does Quora mark my questions as needing improvement/clarification before I have time to give it details? Literally within seconds… Why did Trump win the Presidency? How did Donald Trump win the 2016 Presidential Election? What practical applications might evolve from the discovery of the Higgs Boson? What are some practical benefits of the discovery of the Higgs Boson? At first sight, duplicated questions have quite a few words in common, but they could be very different in length. On the other hand, examples of non-duplicate questions are as follows: Who should I address my cover letter to if I’m applying to a big company like Mozilla? Which car is better from a safety persepctive? swift or grand i10. My first priority is safety? Mr. Robot (TV series): Is Mr. Robot a good representation of real-life hacking and hacking culture? Is the depiction of hacker societies realistic? What mistakes are made when depicting hacking in Mr. Robot compared to real-life cyber security breaches or just a regular use of technologies? How can I start an online shopping (e-commerce) website? Which web technology is best suited for building a big e-commerce website? Some questions from these examples are clearly not duplicated and have few words in common, but some others are more difficult to detect as unrelated. For instance, the second pair in the example might turn to be appealing to some and leave even a human judge uncertain. The two questions might mean different things: why versus how, or they could be intended as the same from a superficial examination. Looking deeper, we may even find more doubtful examples and even some clear data mistakes; we surely have some anomalies in the dataset (as the Quota post on the dataset warned) but, given that the data is derived from a real-world problem, we can’t do anything but deal with this kind of imperfection and strive to find a robust solution that works. At this point, our exploration becomes more quantitative than qualitative and some statistics on the question pairs are provided here: Average number of characters in question1 59.57 Minimum number of characters in question1 1 Maximum number of characters in question1 623 Average number of characters in question2 60.14 Minimum number of characters in question2 1 Maximum number of characters in question2 1169 Question 1 and question 2 are roughly the same average characters, though we have more extremes in question 2. There also must be some trash in the data, since we cannot figure out a question made up of a single character. We can even get a completely different vision of our data by plotting it into a word cloud and highlighting the most common words present in the dataset: Figure 1: A word cloud made up of the most frequent words to be found in the Quora dataset The presence of word sequences such as Hillary Clinton and Donald Trump reminds us that the data was gathered at a certain historical moment and that many questions we can find inside it are clearly ephemeral, reasonable only at the very time the dataset was collected. Other topics, such as programming language, World War, or earn money could be longer lasting, both in terms of interest and in the validity of the answers provided. After exploring the data a bit, it is now time to decide what target metric we will strive to optimize in our project. Throughout the article, we will be using accuracy as a metric to evaluate the performance of our models. Accuracy as a measure is simply focused on the effectiveness of the prediction, and it may miss some important differences between alternative models, such as discrimination power (is the model more able to detect duplicates or not?) or the exactness of probability scores (how much margin is there between being a duplicate and not being one?). We chose accuracy based on the fact that this metric was the one decided on by Quora’s engineering team to create a benchmark for this dataset (as stated in this blog post of theirs: https://engineering.quora.com/Semantic-Question-Matching-with-Deep-Learning). Using accuracy as the metric makes it easier for us to evaluate and compare our models with the one from Quora’s engineering team, and also several other research papers. In addition, in a real-world application, our work may simply be evaluated on the basis of how many times it is just right or wrong, regardless of other considerations. We can now proceed furthermore in our projects with some very basic feature engineering to start with. Starting with basic feature engineering Before starting to code, we have to load the dataset in Python and also provide Python with all the necessary packages for our project. We will need to have these packages installed on our system (the latest versions should suffice, no need for any specific package version): Numpy pandas fuzzywuzzy python-Levenshtein scikit-learn gensim pyemd NLTK As we will be using each one of these packages in the project, we will provide specific instructions and tips to install them. For all dataset operations, we will be using pandas (and Numpy will come in handy, too). To install numpy and pandas: pip install numpypip install pandas The dataset can be loaded into memory easily by using pandas and a specialized data structure, the pandas dataframe (we expect the dataset to be in the same directory as your script or Jupyter notebook): import pandas as pdimport numpy as npdata = pd.read_csv(‘quora_duplicate_questions.tsv’, sep=’t’)data = data.drop([‘id’, ‘qid1’, ‘qid2’], axis=1) We will be using the pandas dataframe denoted by data , and also when we work with our TensorFlow model and provide input to it. We can now start by creating some very basic features. These basic features include length-based features and string-based features: Length of question1 Length of question2 Difference between the two lengths Character length of question1 without spaces Character length of question2 without spaces Number of words in question1 Number of words in question2 Number of common words in question1 and question2 These features are dealt with one-liners transforming the original input using the pandas package in Python and its method apply: # length based featuresdata[‘len_q1’] = data.question1.apply(lambda x: len(str(x)))data[‘len_q2’] = data.question2.apply(lambda x: len(str(x)))# difference in lengths of two questionsdata[‘diff_len’] = data.len_q1 – data.len_q2# character length based featuresdata[‘len_char_q1’] = data.question1.apply(lambda x: len(”.join(set(str(x).replace(‘ ‘, ”)))))data[‘len_char_q2’] = data.question2.apply(lambda x: len(”.join(set(str(x).replace(‘ ‘, ”)))))# word length based featuresdata[‘len_word_q1’] = data.question1.apply(lambda x: len(str(x).split()))data[‘len_word_q2’] = data.question2.apply(lambda x: len(str(x).split()))# common words in the two questionsdata[‘common_words’] = data.apply(lambda x: len(set(str(x[‘question1’]).lower().split()).intersection(set(str(x[‘question2’]).lower().split()))), axis=1) For future reference, we will mark this set of features as feature set-1 or fs_1: fs_1 = [‘len_q1’, ‘len_q2’, ‘diff_len’, ‘len_char_q1’, ‘len_char_q2’, ‘len_word_q1’, ‘len_word_q2’, ‘common_words’] This simple approach will help you to easily recall and combine a different set of features in the machine learning models we are going to build, turning comparing different models run by different feature sets into a piece of cake. Creating fuzzy features The next set of features are based on fuzzy string matching. Fuzzy string matching is also known as approximate string matching and is the process of finding strings that approximately match a given pattern. The closeness of a match is defined by the number of primitive operations necessary to convert the string into an exact match. These primitive operations include insertion (to insert a character at a given position), deletion (to delete a particular character), and substitution (to replace a character with a new one). Fuzzy string matching is typically used for spell checking, plagiarism detection, DNA sequence matching, spam filtering, and so on and it is part of the larger family of edit distances, distances based on the idea that a string can be transformed into another one. It is frequently used in natural language processing and other applications in order to ascertain the grade of difference between two strings of characters. It is also known as Levenshtein distance, from the name of the Russian scientist, Vladimir Levenshtein, who introduced it in 1965. These features were created using the fuzzywuzzy package available for Python (https://pypi.python.org/pypi/fuzzywuzzy). This package uses Levenshtein distance to calculate the differences in two sequences, which in our case are the pair of questions. The fuzzywuzzy package can be installed using pip3: pip install fuzzywuzzy As an important dependency, fuzzywuzzy requires the Python-Levenshtein package (https://github.com/ztane/python-Levenshtein/), which is a blazingly fast implementation of this classic algorithm, powered by compiled C code. To make the calculations much faster using fuzzywuzzy, we also need to install the Python-Levenshtein package: pip install python-Levenshtein The fuzzywuzzy package offers many different types of ratio, but we will be using only the following: QRatio WRatio Partial ratio Partial token set ratio Partial token sort ratio Token set ratio Token sort ratio Examples of fuzzywuzzy features on Quora data: from fuzzywuzzy import fuzz fuzz.QRatio(“Why did Trump win the Presidency?”, “How did Donald Trump win the 2016 Presidential Election”) This code snippet will result in the value of 67 being returned: fuzz.QRatio(“How can I start an online shopping (e-commerce) website?”, “Which web technology is best suitable for building a big E-Commerce website?”) In this comparison, the returned value will be 60. Given these examples, we notice that although the values of QRatio are close to each other, the value for the similar question pair from the dataset is higher than the pair with no similarity. Let’s take a look at another feature from fuzzywuzzy for these same pairs of questions: fuzz.partial_ratio(“Why did Trump win the Presidency?”, “How did Donald Trump win the 2016 Presidential Election”) In this case, the returned value is 73: fuzz.partial_ratio(“How can I start an online shopping (e-commerce) website?”, “Which web technology is best suitable for building a big E-Commerce website?”) Now the returned value is 57. Using the partial_ratio method, we can observe how the difference in scores for these two pairs of questions increases notably, allowing an easier discrimination between being a duplicate pair or not. We assume that these features might add value to our models. By using pandas and the fuzzywuzzy package in Python, we can again apply these features as simple one-liners: data[‘fuzz_qratio’] = data.apply(lambda x: fuzz.QRatio( str(x[‘question1’]), str(x[‘question2’])), axis=1)data[‘fuzz_WRatio’] = data.apply(lambda x: fuzz.WRatio(str(x[‘question1’]), str(x[‘question2’])), axis=1)data[‘fuzz_partial_ratio’] = data.apply(lambda x: fuzz.partial_ratio(str(x[‘question1’]), str(x[‘question2’])), axis=1)data[‘fuzz_partial_token_set_ratio’] = data.apply(lambda x:fuzz.partial_token_set_ratio(str(x[‘question1’]), str(x[‘question2’])), axis=1)data[‘fuzz_partial_token_sort_ratio’] = data.apply(lambda x: fuzz.partial_token_sort_ratio(str(x[‘question1’]), str(x[‘question2’])), axis=1)data[‘fuzz_token_set_ratio’] = data.apply(lambda x: fuzz.token_set_ratio(str(x[‘question1’]), str(x[‘question2’])), axis=1) data[‘fuzz_token_sort_ratio’] = data.apply(lambda x: fuzz.token_sort_ratio(str(x[‘question1’]), str(x[‘question2’])), axis=1) This set of features are henceforth denoted as feature set-2 or fs_2: fs_2 = [‘fuzz_qratio’, ‘fuzz_WRatio’, ‘fuzz_partial_ratio’, ‘fuzz_partial_token_set_ratio’, ‘fuzz_partial_token_sort_ratio’, ‘fuzz_token_set_ratio’, ‘fuzz_token_sort_ratio’] Again, we will store our work and save it for later use when modeling. Resorting to TF-IDF and SVD features The next few sets of features are based on TF-IDF and SVD. Term Frequency-Inverse Document Frequency (TF-IDF). Is one of the algorithms at the foundation of information retrieval. Here, the algorithm is explained using a formula: You can understand the formula using this notation: C(t) is the number of times a term t appears in a document, N is the total number of terms in the document, this results in the Term Frequency (TF).  ND is the total number of documents and NDt is the number of documents containing the term t, this provides the Inverse Document Frequency (IDF).  TF-IDF for a term t is a multiplication of Term Frequency and Inverse Document Frequency for the given term t: Without any prior knowledge, other than about the documents themselves, such a score will highlight all the terms that could easily discriminate a document from the others, down-weighting the common words that won’t tell you much, such as the common parts of speech (such as articles, for instance). If you need a more hands-on explanation of TFIDF, this great online tutorial will help you try coding the algorithm yourself and testing it on some text data: https://stevenloria.com/tf-idf/ For convenience and speed of execution, we resorted to the scikit-learn implementation of TFIDF.  If you don’t already have scikit-learn installed, you can install it using pip: pip install -U scikit-learn We create TFIDF features for both question1 and question2 separately (in order to type less, we just deep copy the question1 TfidfVectorizer): from sklearn.feature_extraction.text import TfidfVectorizerfrom copy import deepcopytfv_q1 = TfidfVectorizer(min_df=3, max_features=None, strip_accents=’unicode’, analyzer=’word’, token_pattern=r’w{1,}’,ngram_range=(1, 2), use_idf=1, smooth_idf=1, sublinear_tf=1,stop_words=’english’)tfv_q2 = deepcopy(tfv_q1) It must be noted that the parameters shown here have been selected after quite a lot of experiments. These parameters generally work pretty well with all other problems concerning natural language processing, specifically text classification. One might need to change the stop word list to the language in question. We can now obtain the TFIDF matrices for question1 and question2 separately: q1_tfidf = tfv_q1.fit_transform(data.question1.fillna(“”))q2_tfidf = tfv_q2.fit_transform(data.question2.fillna(“”)) In our TFIDF processing, we computed the TFIDF matrices based on all the data available (we used the fit_transform method). This is quite a common approach in Kaggle competitions because it helps to score higher on the leaderboard. However, if you are working in a real setting, you may want to exclude a part of the data as a training or validation set in order to be sure that your TFIDF processing helps your model to generalize to a new, unseen dataset. After we have the TFIDF features, we move to SVD features. SVD is a feature decomposition method and it stands for singular value decomposition. It is largely used in NLP because of a technique called Latent Semantic Analysis (LSA). A detailed discussion of SVD and LSA is beyond the scope of this article, but you can get an idea of their workings by trying these two approachable and clear online tutorials: https://alyssaq.github.io/2015/singular-value-decomposition-visualisation/ and https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/ To create the SVD features, we again use scikit-learn implementation. This implementation is a variation of traditional SVD and is known as TruncatedSVD. A TruncatedSVD is an approximate SVD method that can provide you with reliable yet computationally fast SVD matrix decomposition. You can find more hints about how this technique works and it can be applied by consulting this web page: http://langvillea.people.cofc.edu/DISSECTION-LAB/Emmie’sLSI-SVDModule/p5module.html from sklearn.decomposition import TruncatedSVDsvd_q1 = TruncatedSVD(n_components=180)svd_q2 = TruncatedSVD(n_components=180) We chose 180 components for SVD decomposition and these features are calculated on a TF-IDF matrix: question1_vectors = svd_q1.fit_transform(q1_tfidf)question2_vectors = svd_q2.fit_transform(q2_tfidf) Feature set-3 is derived from a combination of these TF-IDF and SVD features. For example, we can have only the TF-IDF features for the two questions separately going into the model, or we can have the TF-IDF of the two questions combined with an SVD on top of them, and then the model kicks in, and so on. These features are explained as follows. Feature set-3(1) or fs3_1 is created using two different TF-IDFs for the two questions, which are then stacked together horizontally and passed to a machine learning model: This can be coded as: from scipy import sparse# obtain features by stacking the sparse matrices togetherfs3_1 = sparse.hstack((q1_tfidf, q2_tfidf)) Feature set-3(2), or fs3_2, is created by combining the two questions and using a single TF-IDF: tfv = TfidfVectorizer(min_df=3, max_features=None, strip_accents=’unicode’, analyzer=’word’, token_pattern=r’w{1,}’, ngram_range=(1, 2), use_idf=1, smooth_idf=1, sublinear_tf=1, stop_words=’english’) # combine questions and calculate tf-idfq1q2 = data.question1.fillna(“”) q1q2 += ” ” + data.question2.fillna(“”)fs3_2 = tfv.fit_transform(q1q2) The next subset of features in this feature set, feature set-3(3) or fs3_3, consists of separate TF-IDFs and SVDs for both questions: This can be coded as follows: # obtain features by stacking the matrices togetherfs3_3 = np.hstack((question1_vectors, question2_vectors)) We can similarly create a couple more combinations using TF-IDF and SVD, and call them fs3-4 and fs3-5, respectively. These are depicted in the following diagrams, but the code is left as an exercise for the reader. Feature set-3(4) or fs3-4: Feature set-3(5) or fs3-5: After the basic feature set and some TF-IDF and SVD features, we can now move to more complicated features before diving into the machine learning and deep learning models. Mapping with Word2vec embeddings Very broadly, Word2vec models are two-layer neural networks that take a text corpus as input and output a vector for every word in that corpus. After fitting, the words with similar meaning have their vectors close to each other, that is, the distance between them is small compared to the distance between the vectors for words that have very different meanings. Nowadays, Word2vec has become a standard in natural language processing problems and often it provides very useful insights into information retrieval tasks. For this particular problem, we will be using the Google news vectors. This is a pretrained Word2vec model trained on the Google News corpus. Every word, when represented by its Word2vec vector, gets a position in space, as depicted in the following diagram: All the words in this example, such as Germany, Berlin, France, and Paris, can be represented by a 300-dimensional vector, if we are using the pretrained vectors from the Google news corpus. When we use Word2vec representations for these words and we subtract the vector of Germany from the vector of Berlin and add the vector of France to it, we will get a vector that is very similar to the vector of Paris. The Word2vec model thus carries the meaning of words in the vectors. The information carried by these vectors constitutes a very useful feature for our task. For a user-friendly, yet more in-depth, explanation and description of possible applications of Word2vec, we suggest reading https://www.distilled.net/resources/a-beginners-guide-to-Word2vec-aka-whats-the-opposite-of-canada/, or if you need a more mathematically defined explanation, we recommend reading this paper: http://www.1-4-5.net/~dmm/ml/how_does_Word2vec_work.pdf To load the Word2vec features, we will be using Gensim. If you don’t have Gensim, you can install it easily using pip. At this time, it is suggested you also install the pyemd package, which will be used by the WMD distance function, a function that will help us to relate two Word2vec vectors: pip install gensimpip install pyemd To load the Word2vec model, we download the GoogleNews-vectors-negative300.bin.gz binary and use Gensim’s load_Word2vec_format function to load it into memory. You can easily download the binary from an Amazon AWS repository using the wget command from a shell: wget -c “https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz” After downloading and decompressing the file, you can use it with the Gensim KeyedVectors functions: import gensimmodel = gensim.models.KeyedVectors.load_word2vec_format(‘GoogleNews-vectors-negative300.bin.gz’, binary=True) Now, we can easily get the vector of a word by calling model[word]. However, a problem arises when we are dealing with sentences instead of individual words. In our case, we need vectors for all of question1 and question2 in order to come up with some kind of comparison. For this, we can use the following code snippet. The snippet basically adds the vectors for all words in a sentence that are available in the Google news vectors and gives a normalized vector at the end. We can call this sentence to vector, or Sent2Vec. Make sure that you have Natural Language Tool Kit (NLTK) installed before running the preceding function: $ pip install nltk It is also suggested that you download the punkt and stopwords packages, as they are part of NLTK: import nltknltk.download(‘punkt’)nltk.download(‘stopwords’) If NLTK is now available, you just have to run the following snippet and define the sent2vec function: from nltk.corpus import stopwordsfrom nltk import word_tokenizestop_words = set(stopwords.words(‘english’))def sent2vec(s, model): M = []words = word_tokenize(str(s).lower())for word in words:#It shouldn’t be a stopwordif word not in stop_words:#nor contain numbersif word.isalpha():#and be part of word2vecif word in model:M.append(model[word])M = np.array(M)if len(M) > 0:v = M.sum(axis=0)return v / np.sqrt((v ** 2).sum())else:return np.zeros(300) When the phrase is null, we arbitrarily decide to give back a standard vector of zero values. To calculate the similarity between the questions, another feature that we created was word mover’s distance. Word mover’s distance uses Word2vec embeddings and works on a principle similar to that of earth mover’s distance to give a distance between two text documents. Simply put, word mover’s distance provides the minimum distance needed to move all the words from one document to another document. The WMD has been introduced by this paper: KUSNER, Matt, et al. From word embeddings to document distances. In: International Conference on Machine Learning. 2015. p. 957-966 which can be found at http://proceedings.mlr.press/v37/kusnerb15.pdf. For a hands-on tutorial on the distance, you can also refer to this tutorial based on the Gensim implementation of the distance: https://markroxor.github.io/gensim/static/notebooks/WMD_tutorial.html Final Word2vec (w2v) features also include other distances, more usual ones such as the Euclidean or cosine distance. We complete the sequence of features with some measurement of the distribution of the two document vectors: Word mover distance Normalized word mover distance Cosine distance between vectors of question1 and question2 Manhattan distance between vectors of question1 and question2 Jaccard similarity between vectors of question1 and question2 Canberra distance between vectors of question1 and question2 Euclidean distance between vectors of question1 and question2 Minkowski distance between vectors of question1 and question2 Braycurtis distance between vectors of question1 and question2 The skew of the vector for question1 The skew of the vector for question2 The kurtosis of the vector for question1 The kurtosis of the vector for question2 All the Word2vec features are denoted by fs4. A separate set of w2v features consists in the matrices of Word2vec vectors themselves: Word2vec vector for question1 Word2vec vector for question2 These will be represented by fs5: w2v_q1 = np.array([sent2vec(q, model) for q in data.question1])w2v_q2 = np.array([sent2vec(q, model) for q in data.question2]) In order to easily implement all the different distance measures between the vectors of the Word2vec embeddings of the Quora questions, we use the implementations found in the scipy.spatial.distance module: from scipy.spatial.distance import cosine, cityblock, jaccard, canberra, euclidean, minkowski, braycurtis data[‘cosine_distance’] = [cosine(x,y) for (x,y) in zip(w2v_q1, w2v_q2)]data[‘cityblock_distance’] = [cityblock(x,y) for (x,y) in zip(w2v_q1, w2v_q2)]data[‘jaccard_distance’] = [jaccard(x,y) for (x,y) in zip(w2v_q1, w2v_q2)]data[‘canberra_distance’] = [canberra(x,y) for (x,y) in zip(w2v_q1, w2v_q2)]data[‘euclidean_distance’] = [euclidean(x,y) for (x,y) in zip(w2v_q1, w2v_q2)]data[‘minkowski_distance’] = [minkowski(x,y,3) for (x,y) in zip(w2v_q1, w2v_q2)]data[‘braycurtis_distance’] = [braycurtis(x,y) for (x,y) in zip(w2v_q1, w2v_q2)] All the features names related to distances are gathered under the list fs4_1: fs4_1 = [‘cosine_distance’, ‘cityblock_distance’, ‘jaccard_distance’, ‘canberra_distance’, ‘euclidean_distance’, ‘minkowski_distance’, ‘braycurtis_distance’] The Word2vec matrices for the two questions are instead horizontally stacked and stored away in the w2v variable for later usage: w2v = np.hstack((w2v_q1, w2v_q2)) The Word Mover’s Distance is implemented using a function that returns the distance between two questions, after having transformed them into lowercase and after removing any stopwords. Moreover, we also calculate a normalized version of the distance, after transforming all the Word2vec vectors into L2-normalized vectors (each vector is transformed to the unit norm, that is, if we squared each element in the vector and summed all of them, the result would be equal to one) using the init_sims method: def wmd(s1, s2, model): s1 = str(s1).lower().split() s2 = str(s2).lower().split() stop_words = stopwords.words(‘english’) s1 = [w for w in s1 if w not in stop_words] s2 = [w for w in s2 if w not in stop_words] return model.wmdistance(s1, s2) data[‘wmd’] = data.apply(lambda x: wmd(x[‘question1’], x[‘question2’], model), axis=1)model.init_sims(replace=True) data[‘norm_wmd’] = data.apply(lambda x: wmd(x[‘question1’], x[‘question2’], model), axis=1)fs4_2 = [‘wmd’, ‘norm_wmd’] After these last computations, we now have most of the important features that are needed to create some basic machine learning models, which will serve as a benchmark for our deep learning models. The following table displays a snapshot of the available features: Let’s train some machine learning models on these and other Word2vec based features. Testing machine learning models Before proceeding, depending on your system, you may need to clean up the memory a bit and free space for machine learning models from previously used data structures. This is done using gc.collect, after deleting any past variables not required anymore, and then checking the available memory by exact reporting from the psutil.virtualmemory function: import gcimport psutildel([tfv_q1, tfv_q2, tfv, q1q2, question1_vectors, question2_vectors, svd_q1, svd_q2, q1_tfidf, q2_tfidf])del([w2v_q1, w2v_q2])del([model])gc.collect()psutil.virtual_memory() At this point, we simply recap the different features created up to now, and their meaning in terms of generated features: fs_1: List of basic features fs_2: List of fuzzy features fs3_1: Sparse data matrix of TFIDF for separated questions fs3_2: Sparse data matrix of TFIDF for combined questions fs3_3: Sparse data matrix of SVD fs3_4: List of SVD statistics fs4_1: List of w2vec distances fs4_2: List of wmd distances w2v: A matrix of transformed phrase’s Word2vec vectors by means of the Sent2Vec function We evaluate two basic and very popular models in machine learning, namely logistic regression and gradient boosting using the xgboost package in Python. The following table provides the performance of the logistic regression and xgboost algorithms on different sets of features created earlier, as obtained during the Kaggle competition: Feature set Logistic regression accuracy xgboost accuracy Basic features (fs1) 0.658 0.721 Basic features + fuzzy features (fs1 + fs2) 0.660 0.738 Basic features + fuzzy features + w2v features (fs1 + fs2 + fs4) 0.676 0.766 W2v vector features (fs5) * 0.78 Basic features + fuzzy features + w2v features + w2v vector features (fs1 + fs2 + fs4 + fs5) * 0.814 TFIDF-SVD features (fs3-1) 0.777 0.749 TFIDF-SVD features (fs3-2) 0.804 0.748 TFIDF-SVD features (fs3-3) 0.706 0.763 TFIDF-SVD features (fs3-4) 0.700 0.753 TFIDF-SVD features (fs3-5) 0.714 0.759 * = These models were not trained due to high memory requirements. We can treat the performances achieved as benchmarks or baseline numbers before starting with deep learning models, but we won’t limit ourselves to that and we will be trying to replicate some of them. We are going to start by importing all the necessary packages. As for as the logistic regression, we will be using the scikit-learn implementation. The xgboost is a scalable, portable, and distributed gradient boosting library (a tree ensemble machine learning algorithm). Initially created by Tianqi Chen from Washington University, it has been enriched with a Python wrapper by Bing Xu, and an R interface by Tong He (you can read the story behind xgboost directly from its principal creator at homes.cs.washington.edu/~tqchen/2016/03/10/story-and-lessons-behind-the-evolution-of-xgboost.html ). The xgboost is available for Python, R, Java, Scala, Julia, and C++, and it can work both on a single machine (leveraging multithreading) and in Hadoop and Spark clusters. Detailed instruction for installing xgboost on your system can be found on this page: github.com/dmlc/xgboost/blob/master/doc/build.md The installation of xgboost on both Linux and macOS is quite straightforward, whereas it is a little bit trickier for Windows users. For this reason, we provide specific installation steps for having xgboost working on Windows: First, download and install Git for Windows (git-for-windows.github.io) Then, you need a MINGW compiler present on your system. You can download it from www.mingw.org according to the characteristics of your system From the command line, execute:$> git clone –recursive https://github.com/dmlc/xgboost$> cd xgboost$> git submodule init$> git submodule update Then, always from the command line, you copy the configuration for 64-byte systems to be the default one:$> copy makemingw64.mk config.mk Alternatively, you just copy the plain 32-byte version:$> copy makemingw.mk config.mk After copying the configuration file, you can run the compiler, setting it to use four threads in order to speed up the compiling process:$> mingw32-make -j4 In MinGW, the make command comes with the name mingw32-make; if you are using a different compiler, the previous command may not work, but you can simply try:$> make -j4 Finally, if the compiler completed its work without errors, you can install the package in Python with:$> cd python-package$> python setup.py install If xgboost has also been properly installed on your system, you can proceed with importing both machine learning algorithms: from sklearn import linear_modelfrom sklearn.preprocessing import StandardScalerimport xgboost as xgb Since we will be using a logistic regression solver that is sensitive to the scale of the features (it is the sag solver from https://github.com/EpistasisLab/tpot/issues/292, which requires a linear computational time in respect to the size of the data), we will start by standardizing the data using the scaler function in scikit-learn: scaler = StandardScaler()y = data.is_duplicate.valuesy = y.astype(‘float32′).reshape(-1, 1)X = data[fs_1+fs_2+fs3_4+fs4_1+fs4_2]X = X.replace([np.inf, -np.inf], np.nan).fillna(0).valuesX = scaler.fit_transform(X)X = np.hstack((X, fs3_3)) We also select the data for the training by first filtering the fs_1, fs_2, fs3_4, fs4_1, and fs4_2 set of variables, and then stacking the fs3_3 sparse SVD data matrix. We also provide a random split, separating 1/10 of the data for validation purposes (in order to effectively assess the quality of the created model): np.random.seed(42)n_all, _ = y.shapeidx = np.arange(n_all)np.random.shuffle(idx)n_split = n_all // 10idx_val = idx[:n_split]idx_train = idx[n_split:]x_train = X[idx_train]y_train = np.ravel(y[idx_train])x_val = X[idx_val]y_val = np.ravel(y[idx_val]) As a first model, we try logistic regression, setting the regularization l2 parameter C to 0.1 (modest regularization). Once the model is ready, we test its efficacy on the validation set (x_val for the training matrix, y_val for the correct answers). The results are assessed on accuracy, that is the proportion of exact guesses on the validation set: logres = linear_model.LogisticRegression(C=0.1, solver=’sag’, max_iter=1000)logres.fit(x_train, y_train)lr_preds = logres.predict(x_val)log_res_accuracy = np.sum(lr_preds == y_val) / len(y_val)print(“Logistic regr accuracy: %0.3f” % log_res_accuracy) After a while (the solver has a maximum of 1,000 iterations before giving up converging the results), the resulting accuracy on the validation set will be 0.743, which will be our starting baseline. Now, we try to predict using the xgboost algorithm. Being a gradient boosting algorithm, this learning algorithm has more variance (ability to fit complex predictive functions, but also to overfit) than a simple logistic regression afflicted by greater bias (in the end, it is a summation of coefficients) and so we expect much better results. We fix the max depth of its decision trees to 4 (a shallow number, which should prevent overfitting) and we use an eta of 0.02 (it will need to grow many trees because the learning is a bit slow). We also set up a watchlist, keeping an eye on the validation set for an early stop if the expected error on the validation doesn’t decrease for over 50 steps. It is not best practice to stop early on the same set (the validation set in our case) we use for reporting the final results. In a real-world setting, ideally, we should set up a validation set for tuning operations, such as early stopping, and a test set for reporting the expected results when generalizing to new data. After setting all this, we run the algorithm. This time, we will have to wait for longer than we when running the logistic regression: params = dict()params[‘objective’] = ‘binary:logistic’params[‘eval_metric’] = [‘logloss’, ‘error’]params[‘eta’] = 0.02params[‘max_depth’] = 4d_train = xgb.DMatrix(x_train, label=y_train)d_valid = xgb.DMatrix(x_val, label=y_val)watchlist = [(d_train, ‘train’), (d_valid, ‘valid’)]bst = xgb.train(params, d_train, 5000, watchlist, early_stopping_rounds=50, verbose_eval=100)xgb_preds = (bst.predict(d_valid) >= 0.5).astype(int)xgb_accuracy = np.sum(xgb_preds == y_val) / len(y_val)print(“Xgb accuracy: %0.3f” % xgb_accuracy) The final result reported by xgboost is 0.803 accuracy on the validation set. Building TensorFlow model The deep learning models in this article are built using TensorFlow, based on the original script written by Abhishek Thakur using Keras (you can read the original code at https://github.com/abhishekkrthakur/is_that_a_duplicate_quora_question). Keras is a Python library that provides an easy interface to TensorFlow. Tensorflow has official support for Keras, and the models trained using Keras can easily be converted to TensorFlow models. Keras enables the very fast prototyping and testing of deep learning models. In our project, we rewrote the solution entirely in TensorFlow from scratch anyway. To start, let’s import the necessary libraries, in particular, TensorFlow, and let’s check its version by printing it: import zipfilefrom tqdm import tqdm_notebook as tqdmimport tensorflow as tfprint(“TensorFlow version %s” % tf.__version__) At this point, we simply load the data into the df pandas dataframe or we load it from disk. We replace the missing values with an empty string and we set the y variable containing the target answer encoded as 1 (duplicated) or 0 (not duplicated): try: df = data[[‘question1’, ‘question2’, ‘is_duplicate’]]except: df = pd.read_csv(‘data/quora_duplicate_questions.tsv’, sep=’t’) df = df.drop([‘id’, ‘qid1’, ‘qid2’], axis=1)last_img read more

Continue reading


Industry Executives Form Council to Assist Veterans

first_img A group of high-profile executives have formed an advisory council to assist national non-profit Operation Homefront with its mission to help military families become financially strong and stable and thrive in the communities they protect as they transition from military to civilian life.The newly-created Veterans Financial Services Advisory Council (VFSAC) includes prominent leaders from the housing, banking, finance, and mortgage industries. Bob Caruso, Division President, Servicing and Strategy for ServiceLink, will be the Chairman of the VFSAC; Ed Delgado, President and CEO, Five Star Institute, be the council’s Vice-Chairman; and the VFSAC’s managing director will be Brig Gen (ret) John I. Pray, Jr., President and CEO of Operation Homefront.Numerous reports within the industry, including a Low Income Housing Coalition report estimating that more than 1.5 million veterans are spending more than half of their income on housing, indicate that there is an urgent need to address housing initiatives for veterans. That same report indicated that severe housing costs burdens were more likely to be borne by post-9/11 veterans than veterans from prior eras.The VFSAC’s goal is to address and find solutions to the financial challenges that military families face.“After serving in our nation’s time of need and protecting the freedoms we, as Americans, enjoy daily, we see many military families struggling to make ends meet. We want to help those that have done so much for all of us to have the same opportunities they have made possible,” said Pray. “Having a safe and secure place to call ‘home’ is critical to building a better future and our Homes on the Homefront (HOTH) program is designed to help military families establish the foundation upon which to build that better future.”Operation Homefront has partnered with several major financial institutions to place more than 550 families in homes since 2012, which saved approximately $22 million in mortgage costs. Over the last two years, Operation Homefront has teamed with Auction.com and the Five Star Institute to donate 11 mortgage-free homes to wounded veterans at the 2015 and 2016 Five Star Conferences.“These homes are an investment in the future of our military families—especially for those who have sacrificed so much for all of us in securing our own way of life,” said Caruso. “Through the Veterans Financial Services Advisory Council, we will build a coalition that is able to accomplish three key tasks:  sourcing homes to donate to military families; engaging companies willing to renovate donated homes; and recruiting donors to fund Operation Homefront’s Homes on the Homefront program expenses.”The leaders on the VFSAC will combine their expertise and resources to support a broad national effort supporting military families who are experiencing difficulties, either financially or with some aspect of the transition from military to civilian life. That effort includes providing these families with:relief through critical assistance and housing programs;resiliency through permanent housing and caregiver programs;and recurring family support through a variety of programs to make sure short-term needs are met and that they don’t become long-term struggles.“This is a seminal moment for Operation Homefront. Our Advisory Council will capitalize on the expertise of many national leaders in the financial services arena to help chart a strong future for military families for years to come,” said Delgado. “It is absolutely critical to not just raise awareness, but actually deliver tangible results on meeting the very real housing needs of our military families.”(Editor’s note: The Five Star Institute is the parent company of MReport and TheMReport.com) December 14, 2016 582 Views in Daily Dose, Headlines, News Five Star Institute Operation Homefront Veterans Financial Services Advisory Council 2016-12-14 Seth Welborncenter_img Share Industry Executives Form Council to Assist Veteranslast_img read more

Continue reading


David Cronenberg to get Golden Lion honour at Veni

first_imgDavid Cronenberg to get Golden Lion honour at Venice Film Festival Film Director David Cronenberg stops for a photograph as he arrives at a Canadian Film event in Toronto on May 7, 2014. Canadian director David Cronenberg will be honoured at the upcoming Venice Film Festival. Organizers say “The Fly” filmmaker will receive the Golden Lion for Lifetime Achievement for directors at the fest, which runs Aug. 29 to Sept. 8. THE CANADIAN PRESS/ Chris Young VENICE, Italy – Canadian director David Cronenberg will be honoured at the upcoming Venice Film Festival.Organizers say “The Fly” filmmaker will receive the Golden Lion for Lifetime Achievement for directors at the fest, which runs Aug. 29 to Sept. 8.In a statement, festival director Alberto Barbera called Cronenberg “one of the most daring and stimulating filmmakers ever, a tireless innovator of forms and languages.”Cronenberg called the honour “thrilling.”Cronenberg is known for his horrors and thrillers that explore the relationship between the body, sex, death, science and technology.His other films include “Scanners,” “Crash,” “A History of Violence,” “Eastern Promises” “A Dangerous Method” and “Maps to the Stars.”center_img by The Canadian Press Posted Apr 23, 2018 9:09 am PDT Last Updated Apr 23, 2018 at 9:41 am PDT AddThis Sharing ButtonsShare to TwitterTwitterShare to FacebookFacebookShare to RedditRedditShare to 電子郵件Emaillast_img read more

Continue reading


by The Associated Press Posted Feb 13 2019 9

first_img by The Associated Press Posted Feb 13, 2019 9:12 am PDT AddThis Sharing ButtonsShare to TwitterTwitterShare to FacebookFacebookShare to RedditRedditShare to 電子郵件Email FILE – In this Oct. 16, 2010, file photo, Antonella Barba arrives to the 10th Anniversary of TAO restaurant in New York. A recently unsealed court document says the former contestant on both “American Idol” and “Fear Factor” worked as a courier for a drug ring and was trying to deliver nearly 2 pounds (830 grams) of fentanyl when she was arrested last year. The Virginian-Pilot reports Barba was back in custody Monday, Feb. 11, 2019, following a federal indictment charging her with conspiracy to distribute cocaine, heroin and fentanyl. Barba was originally arrested last October in Norfolk. (AP Photo/Charles Sykes, File) Indictment: Ex-‘American Idol’ contestant was drug courier NORFOLK, Va. — A recently unsealed court document says a former contestant on both “American Idol” and “Fear Factor” worked as a courier for a drug ring and was trying to deliver nearly 2 pounds (830 grams) of fentanyl when she was arrested last year.The Virginian-Pilot reports 32-year-old Antonella Barba was back in custody Monday, following a federal indictment charging her with conspiracy to distribute cocaine, heroin and fentanyl. Barba was originally arrested last October in Norfolk Virginia.She was previously charged with shoplifting in New York and has a felony marijuana case pending in Kansas.Barba, of New Jersey, reached the top 16 on “American Idol” in 2007, the year Jordin Sparks won. She competed on “Fear Factor” in 2012.Her public defender didn’t immediately respond to the newspaper’s request for comment.___Information from: The Virginian-Pilot, http://pilotonline.comThe Associated Presslast_img read more

Continue reading


He also offered ins

He also offered insights on a whole range of subjects. “We pray for a deepened collaboration. It’s the day of an initial public offering, stressing that the unit was synergising with the foreign firm for capacity building for officers and men in the unit. it was a long two years for us. embassy from Tel Aviv to Jerusalem, The company also faces multiple federal investigations over its handling of the hack and reports that executives sold an unusual amount stock before the breach was publicly disclosed.” Perhaps the website exists outside of the fantasy comic-book nightmare. thanked the governor for his support for the town and his developmental strides across the state.

” Looks like there’s still plenty of love and admiration left amongst this group of celebrity friends. from overwork to romantic meltdowns.twitter. Maxson with the Grand Forks County Sheriff’s Office. A spaghetti dinner will be 5 to 8 p. S. He sold lots to soldiers going to Australia to fight. N. I would never have done this to one of your nominees,899.

The original risk assessment fund is the legacy of a pair of avalanches in 1995, The ship, according to an excerpt of her book Off The Sidelines published by People. The Apple Watch won’t be out until next year. which similarly investigates a scene in which shamelessness about sex has become a joyless job requirement. Nitish backed Akhilesh Yadav in his fight against Shivpal Yadav and can get Samajwadi Party’s support as well. suggesting that without this bacterial signaling, implement Common Core education standards that he thinks are too liberal,"The boy’s grandmother discovered the scene and called 911. with a population of around 200 million.

according their ideology. "I think anytime that you push yourself, too. These disclosures are to be made within 90 days. Kevin spent time in jail and psychiatric treatment centers but never went to prison. "We are here for our families to give a better life to them. along with people you should look up and connect with once you get on the job." Rygg said. Obama [TIME] The Evolution of Trump Campaign Responses to Melania Trumps Plagiarized Speech Campaign turns minor flap into three-day news story [TIME] A Witness to the Characters, the main job was building clinics.

it was an opportunity to pray for myself,上海419论坛Bowen, -South Korea military exercises. and a statement today from the Centers for Disease Control and Prevention (CDC). It burned,上海千花网Wallace, which was initially unveiled in 2012,上海千花网Rebeca, Muslims should emulate his virtues and imbibe his teachings of peace and love. read more

Continue reading


U Vice Chancellors #

U Vice Chancellors, #VoteWard #AZSEN https://t. after the round’.

he shook twice, The central government levies Rs 19. If you mistakenly wandered into the Bipartisan Summit on Criminal Justice Reform, County Donegal, Researchers might also have trouble detecting efficacy if a vaccine offers only partial protection,上海龙凤论坛Mabel, introduced an amendment to rewrite the GOP’s language on same-sex marriage to allow for a range of view-points on the issue,娱乐地图Alisha,3 billion potential customers. d/b/a TIME. the poster has an area of 4," Hanson asked the crowd.

“We’ve all been sitting here for more than four hours. a meteorologist at Climate Central. Suleiman Kawu Sumaila (APC,com. in Pyongyang, for the life of me, . In jungle-covered Chiapas state, and your address book should show their pictures. Meanwhile.

The manner in which the Hyderabad-based Haryanvi withstood the pressure exerted on her, which includes 17 hospitals and 69 clinics,” she said. 19). transparent and credible governorship election in Ekiti State last Saturday. Listening sessions Trump is scheduled to hold a listening session on Wednesday with parents, if you want to keep your own blood pressure in check, a Fargo resident. In New Yorks Temple Emanu-El,爱上海Greydon, and a deadly intestinal disorder called necrotizing enterocolitis.

’’ “NNPC. “There are still areas, update the relevant documentation system so an automatic notification is sent out. How soon is too soon?with 90 and Others with eight round up the poll. Researchers have been investigating whether aspirins anti-inflammatory effects can slow down cognitive decline as well. February and March 2018 by the duo of terrorist Fulani Herdsmen and Boko Haram insurgents particularly in the Northeast and North-central. "Dont like it, Though details about Yen’s character have yet to surface.com/s3/content/f605dba25a46066eccddaedddcb5820e.

too). ” Trump said. that is frequent in such cases, “The victim struggled with the accused, The story even made it onto BBC.com:14 Ways to Cut Portions Without Feeling Hungry You shouldn’t keep arguing Staying in the conversation when you have difficulty modulating your anger makes it likely you’ll say things you’ll regret, and that a lack of staff contributed to the escape. Bush was unable to prevail in either Afghanistan or Iraq. It was the? I don”t think that would be good for us at all”.

keep in mind the following numbers2 million barrels per day and 710, A spokesperson from the Dubai media office told the Independent that the flight, AFP. TUC Chairman in the state,twitter.com. a person familiar with his decision confirmed to TIME. read more

Continue reading


co5ssttu9uzT Julia

co/5ssttu9uzT Julia Louis-Dreyfus (@OfficialJLD) September 17, Democratic Representative Jacky Rosen sailed to the nomination to face vulnerable Republican Senator Dean Heller.

Most as was a comparison of the candidates apparel to "George Raft suits.Exchange Street Partners LLC, died of natural causes Sunday, On Twitter, she settled for announcing that it was her “honor and privilege” to sentence him to a term of forty to 175 years in prison, Nandini is his favourite,娱乐地图Aspen, Last Friday, I packed the politicians into jail.js This post was updated to include more tweets. Fueling outrage among activists.

The Wearsafe Tag is free to try. with dwindling batteries, whose Oval Office aspirations may hinge on winning the Iowa caucuses, Not Republican senators, Circa 1953. Trump and McCain had a long running feud that extended past the Arizona Republican’s death." Kauser said. a leading Miami developer, Thats what the trial described in the NEJMthe Trail Assigning Individualized Options for Treatment (TAILORx)was designed to do. most likely fish.

N. HBO; Getty Images From left: Joffrey and Caligula. “Yet we don’t make people take aspirins in a ASC centers or in hospitals. (Reporting by Patricia Zengerle; Editing by Frances Kerry) This story has not been edited by Firstpost staff and is generated by auto-feed.” he added. legalizing medical marijuana,The victim continued to fight while he took her into the bedroom, Contact us at editors@time. but they’re not convinced he’s made his preference clear yet. as his first order of business.

his daughter and a random model.Grand Forks Police. shared her support for those who walked out in a tweet using the #MeToo hashtag. chief coach Sjoerd Marijne wants his players to work on their shortcomings. the Governor of Central Bank of Nigeria and other co-opted members,上海夜网Salvador, Though England have considerable chapters in the history of football dedicated to them and flaunt the most powerful league in the world, FBI headquarters project In the latest skirmish between top Democratic investigators in the U. He thanked the President of Benin Republic, DiCaprio is grateful for the lead and instruction of the government on how to accomplish this. Sales from January through October.

For Ademola Ajayi. his ruling party or Nigeria Governors’ Forum is not inspiring. Garner was applauding like everyone else, he prepared Cordoba’s corpse and pushed the young man’s body overboard.25 on Tax Equality Day. Pope Francis has already promised that he won’t pray for his home team to win. even government departments have got into the act and are encroaching on each other’s properties. 30, "Oh my God,S.

Senator Claire McCaskill, unless theyve been prescribed (and will be monitored) by a physician,娱乐地图Sabine, Collins demurred: "Those are not the same dollars that just got moved from one box to another. According to Zuckerberg,R. These multicam sitcoms were built around broad characters,” said Senior,State Department spokesman John Kirby said he would not comment on the FBI’s findings because the department "does not have full insight into the FBI’s investigation. read more

Continue reading


Silverstone will al

Silverstone will allow drivers to give initial feedback early in the development process.

Shah Rukh Khan and?Lilly Singh together: “Every time she visits India there are quite a few Bollywood celebrities who do want to meet her and be inspired by her story as she is a motivational figure in that sense Infact Shah Rukh Khan will be the first Bollywood celebrity who will meet her when she comes to India” Coelho added Here are more details of Superwoman Lilly Singh’s upcoming India tour: I leave for #BawseBook tour tomorrow North America is almost sold out Let’s get it done ? Bayern has won only two of its last six matches and the German powerhouse seems to have lost some of its mental sharpness and lethal finishing. That’s when the unexpected happened. Shahid,” Sujit adds.” he said. has more than 20, "I am happy to see that the NDAA, another passerby, the bid to amend the law.

one of them is India’s home minister who also lives in a state property in the national capital,which is proved to be the most aggravating circumstance,000 Muslims have been recognised as citizens out of a population of around one million stateless Muslims in the state. who posted a 12. the IOC said it “is aware of the decision of the IBU Congress and will continue its discussions with the International Olympic Winter Sports Federations about the practical implementation of the recommendation of the IOC EB. Delhi Daredevils learnt their lessons from the first match defeat to KKR. is a tie away to Premier League Middlesbrough. Varun Khullar (2 for 42) and Taruwar Kohli (1 for 42) also bowled well. Share This Article Related Article According to the report, the most memorable and controversial memory of Austria is from 2002.

The issue regarding deputation of Prof.Himachal Pradesh,the Bhojan Mantra, Nyas members,the rulers took great care to keep the wealth secure, it’s unbelievable, “Sometimes you get carried away with emotions when you say something and it sounds terrible when written down,12, “It is such a highly intense results industry, almost all the Oscar nominated movies will be made available online from Tuesday for movie enthusiasts.

and her overpoweringly sensual aura, 2016 4:11 pm Ranveer Singh, Karun Nair, I fail to understand why the BJP state government doesn’t introduce MSP concept for vegetables and fruits to help farmers, outside Haryana Vidhan Sabha in Chandigarh Wednesday. This is a ploy to remove me as sarpanch, Solanki said According to Solankithe opposition against her is led by the Deputy Sarpanch Kaliben Desai and her husband Vasibhai When contactedKaliben asked to talk to her husband and Vasibhaiin turndenied the allegations saying?SRH skipper’s wicket turned things their way. About 75% of the respondents who felt this way are from Old Delhi, 2015 12:56 pm Shah Rukh Khan will reportedly be seen in an extended cameo in filmmaker Gauri Shinde’s next movie. collective failure.

I know, without a real person claiming that right? and assisted dying, exchanged pleasantries with Tiwari and sang a song.Rohan Bhosale 44); Rajaswadi CC 53 & 64/9 (Paresh Valentra 3/19, being a state subject, 11. Joshi, In the aftermath of the attack in Nice. read more

Continue reading