Whose algorithm should it be anyway




















All of our extended thinking systems algorithms fuel the software and connectivity that create extended thinking systems demand more thinking — not less — and a more global perspective than we have previously managed. The expanding collection and analysis of data and the resulting application of this information can cure diseases, decrease poverty, bring timely solutions to people and places where need is greatest, and dispel millennia of prejudice, ill-founded conclusions, inhumane practice and ignorance of all kinds.

Our algorithms are now redefining what we think, how we think and what we know. We need to ask them to think about their thinking — to look out for pitfalls and inherent biases before those are baked in and harder to remove.

That, by itself, is a tall order that requires impartial experts backtracking through the technology development process to find the models and formulae that originated the algorithms. Then, keeping all that learning at hand, the experts need to soberly assess the benefits and deficits or risks the algorithms create. Who is prepared to do this? Who has the time, the budget and resources to investigate and recommend useful courses of action? This is a 21st-century job description — and market niche — in search of real people and companies.

In order to make algorithms more transparent, products and product information circulars might include an outline of algorithmic assumptions, akin to the nutritional sidebar now found on many packaged food products, that would inform users of how algorithms drive intelligence in a given product and a reasonable outline of the implications inherent in those assumptions.

A number of respondents noted the many ways in which algorithms will help make sense of massive amounts of data, noting that this will spark breakthroughs in science, new conveniences and human capacities in everyday life, and an ever-better capacity to link people to the information that will help them. They perform seemingly miraculous tasks humans cannot and they will continue to greatly augment human intelligence and assist in accomplishing great things.

A representative proponent of this view is Stephen Downes , a researcher at the National Research Council of Canada, who listed the following as positive changes:. Today banks provide loans based on very incomplete data. It is true that many people who today qualify for loans would not get them in the future.

However, many people — and arguably many more people — will be able to obtain loans in the future, as banks turn away from using such factors as race, socio-economic background, postal code and the like to assess fit. Health care is a significant and growing expense not because people are becoming less healthy in fact, society-wide, the opposite is true but because of the significant overhead required to support increasingly complex systems, including prescriptions, insurance, facilities and more.

New technologies will enable health providers to shift a significant percentage of that load to the individual, who will with the aid of personal support systems manage their health better, coordinate and manage their own care, and create less of a burden on the system.

As the overall cost of health care declines, it becomes increasingly feasible to provide single-payer health insurance for the entire population, which has known beneficial health outcomes and efficiencies. A significant proportion of government is based on regulation and monitoring, which will no longer be required with the deployment of automated production and transportation systems, along with sensor networks. This includes many of the daily and often unpleasant interactions we have with government today, from traffic offenses, manifestation of civil discontent, unfair treatment in commercial and legal processes, and the like.

A simple example: One of the most persistent political problems in the United States is the gerrymandering of political boundaries to benefit incumbents. Electoral divisions created by an algorithm to a large degree eliminate gerrymandering and when open and debatable, can be modified to improve on that result.

Participants in this study were in substantial agreement that the abundant positives of accelerating code-dependency will continue to drive the spread of algorithms; however, as with all great technological revolutions, this trend has a dark side.

Most respondents pointed out concerns, chief among them the final five overarching themes of this report; all have subthemes. Advances in algorithms are allowing technology corporations and governments to gather, store, sort and analyze massive data sets. Experts in this canvassing noted that these algorithms are primarily written to optimize efficiency and profitability without much thought about the possible societal impacts of the data modeling and analysis.

The goal of algorithms is to fit some of our preferences, but not necessarily all of them: They essentially present a caricature of our tastes and preferences. My biggest fear is that, unless we tune our algorithms for self-actualization , it will be simply too convenient for people to follow the advice of an algorithm or, too difficult to go beyond such advice , turning these algorithms into self-fulfilling prophecies, and users into zombies who exclusively consume easy-to-consume items.

Every time you design a human system optimized for efficiency or profitability you dehumanize the workforce. That dehumanization has now spread to our health care and social services. When you remove the humanity from a system where people are included, they become victims. Who is collecting what data points? Do the human beings the data points reflect even know or did they just agree to the terms of service because they had no real choice?

Who is making money from the data? There is no transparency, and oversight is a farce. A sampling of excerpts tied to this theme from other respondents for details, read the fuller versions in the full report :. Two strands of thinking tie together here. One is that the algorithm creators code writers , even if they strive for inclusiveness, objectivity and neutrality, build into their creations their own perspectives and values.

The other is that the datasets to which algorithms are applied have their own limits and deficiencies. Moreover, the datasets themselves are imperfect because they do not contain inputs from everyone or a representative sample of everyone. The two themes are advanced in these answers:. Most people in positions of privilege will find these new tools convenient, safe and useful.

The harms of new technology will be most experienced by those already disadvantaged in society, where advertising algorithms offer bail bondsman ads that assume readers are criminals, loan applications that penalize people for proxies so correlated with race that they effectively penalize people based on race, and similar issues.

Much of it either racial- or class-related, with a fair sprinkling of simply punishing people for not using a standard dialect of English. To paraphrase Immanuel Kant, out of the crooked timber of these datasets no straight thing was ever made.

A sampling of quote excerpts tied to this theme from other respondents for details, read the fuller versions in the full report :.

One of the greatest challenges of the next era will be balancing protection of intellectual property in algorithms with protecting the subjects of those algorithms from unfair discrimination and social engineering. First, they predicted that an algorithm-assisted future will widen the gap between the digitally savvy predominantly the most well-off, who are the most desired demographic in the new information ecosystem and those who are not nearly as connected or able to participate.

Second, they said social and political divisions will be abetted by algorithms, as algorithm-driven categorizations and classifications steer people into echo chambers of repeated and reinforced media and political content. Two illustrative answers:. And that divide will be self-perpetuating, where those with fewer capabilities will be more vulnerable in many ways to those with more.

Brushing up against contrasting viewpoints challenges us, and if we are able to actively or passively avoid others with different perspectives, it will negatively impact our society. It will be telling to see what features our major social media companies add in coming years, as they will have tremendous power over the structure of information flow.

The overall effect will be positive for some individuals. It will be negative for the poor and the uneducated. As a result, the digital divide and wealth disparity will grow. It will be a net negative for society. The spread of artificial intelligence AI has the potential to create major unemployment and all the fallout from that. What will then be the fate of Man? The respondents to this canvassing offered a variety of ideas about how individuals and the broader culture might respond to the algorithm-ization of life.

They argued for public education to instill literacy about how algorithms function in the general public. They also noted that those who create and evolve algorithms are not held accountable to society and argued there should be some method by which they are. Representative comments:. What is the supply chain for that information? Is there clear stewardship and an audit trail? Were the assumptions based on partial information, flawed sources or irrelevant benchmarks? Did we train our data sufficiently?

Were the right stakeholders involved, and did we learn from our mistakes? The upshot of all of this is that our entire way of managing organizations will be upended in the next decade. The power to create and change reality will reside in technology that only a few truly understand. So to ensure that we use algorithms successfully, whether for financial or human benefit or both, we need to have governance and accountability structures in place. Easier said than done, but if there were ever a time to bring the smartest minds in industry together with the smartest minds in academia to solve this problem, this is the time.

That coping strategy has always been co-evolving with humanity, and with the complexity of our social systems and data environments. Becoming explicitly aware of our simplifying assumptions and heuristics is an important site at which our intellects and influence mature. What is different now is the increasing power to program these heuristics explicitly, to perform the simplification outside of the human mind and within the machines and platforms that deliver data to billions of individual lives.

It will take us some time to develop the wisdom and the ethics to understand and direct this power. The first and most important step is to develop better social awareness of who, how, and where it is being applied. We need some kind of rainbow coalition to come up with rules to avoid allowing inbuilt bias and groupthink to effect the outcomes.

Finally, this prediction from an anonymous participant who sees the likely endpoint to be one of two extremes:.

I suspect utopia given that we have survived at least one existential crisis nuclear in the past and that our track record toward peace, although slow, is solid. Following is a brief collection of comments by several of the many top analysts who participated in this canvassing:.

When they make a change, they make a prediction about its likely outcome on sales, then they use sales data from that prediction to refine the model. Their model also makes predictions about likely outcomes on reoffending , but there is no tracking of whether their model makes good predictions, and no refinement.

This frees them to make terrible predictions without consequence. The algorithms are not in control; people create and adjust them. However, positive effects for one person can be negative for another, and tracing causes and effects can be difficult, so we will have to continually work to understand and adjust the balance.

The methods behind the decisions it makes are completely opaque, not only to those whose credit is judged, but to most of the people running the algorithm as well. The challenge with supervised learning is that labeling data can be expensive and time-consuming. If labels are limited, you can use unlabeled examples to enhance supervised learning. Because the machine is not fully supervised in this case, we say the machine is semi-supervised.

With semi-supervised learning, you use unlabeled examples with a small amount of labeled data to improve the learning accuracy. When performing unsupervised learning, the machine is presented with totally unlabeled data. It is asked to discover the intrinsic patterns that underlie the data, such as a clustering structure, a low-dimensional manifold, or a sparse tree and graph.

Reinforcement learning is another branch of machine learning which is mainly utilized for sequential decision-making problems. In this type of machine learning, unlike supervised and unsupervised learning, we do not need to have any data in advance; instead, the learning agent interacts with an environment and learns the optimal policy on the fly based on the feedback it receives from that environment.

One component is the resulting state of the environment after the agent has acted on it. Another component is the reward or punishment that the agent receives from performing that particular action in that particular state. The reward is carefully chosen to align with the objective for which we are training the agent.

Using the state and reward, the agent updates its decision-making policy to optimize its long-term reward. With the recent advancements of deep learning , reinforcement learning gained significant attention since it demonstrated striking performances in a wide range of applications such as games, robotics, and control.

To see reinforcement learning models such as Deep-Q and Fitted-Q networks in action, check out this article. When choosing an algorithm, always take these aspects into account: accuracy, training time and ease of use. Many users put the accuracy first, while beginners tend to focus on algorithms they know best. When presented with a dataset, the first thing to consider is how to obtain results, no matter what those results might look like.

Beginners tend to choose algorithms that are easy to implement and can obtain results quickly. This works fine, as long as it is just the first step in the process.

Once you obtain some results and become familiar with the data, you may spend more time using more sophisticated algorithms to strengthen your understanding of the data, hence further improving the results. Even in this stage, the best algorithms might not be the methods that have achieved the highest reported accuracy, as an algorithm usually requires careful tuning and extensive training to obtain its best achievable performance.

Looking more closely at individual algorithms can help you understand what they provide and how they are used. These descriptions provide more details and give additional tips for when to use specific algorithms, in alignment with the cheat sheet. If the dependent variable is not continuous but categorical, linear regression can be transformed to logistic regression using a logit link function.

Logistic regression is a simple, fast yet powerful classification algorithm. In logistic regression we use a different hypothesis class to try to predict the probability that a given example belongs to the "1" class versus the probability that it belongs to the "-1" class. Kernel tricks are used to map a non-linearly separable functions into a higher dimension linearly separable function.

This hyperplane boundary separates different classes by as wide a margin as possible. A support vector machine SVM training algorithm finds the classifier represented by the normal vector and bias of the hyperplane. The problem can be converted into a constrained optimization problem:. When the classes are not linearly separable, a kernel trick can be used to map a non-linearly separable space into a higher dimension linearly separable space.

When most dependent variables are numeric, logistic regression and SVM should be the first try for classification.

These models are easy to implement, their parameters easy to tune, and the performances are also pretty good. So these models are appropriate for beginners. Decision trees, random forest and gradient boosting are all algorithms based on decision trees. There are many variants of decision trees, but they all do the same thing — subdivide the feature space into regions with mostly the same label.

Decision trees are easy to understand and implement. However, they tend to over-fit data when we exhaust the branches and go very deep with the trees.

Random Forrest and gradient boosting are two popular ways to use tree algorithms to achieve good accuracy as well as overcoming the over-fitting problem. A convolution neural network architecture image source: wikipedia creative commons. Neural networks flourished in the mids due to their parallel and distributed processing ability.

Most notably, how many other links pointed to the article, and how reputable those articles were, based on how many links pointed to those pages, and so on. That was a powerful sign of relevance. And the rest is history. How close are they to you in your social network, how relevant it is in its own terms because of the subject, and also how recent.

Facebook, Google, Amazon, and other big tech companies all rely on algorithms to serve content and products to their customers. But there are also algorithms throughout your life that you might not be aware of. Put another way, machine learning is when a programmer feeds a program some raw data as a starting point, then submits the end point of what an organized, classified version of that data looks like, and leaves it up to the program to figure out how to get from point A to point B.

Consider an onion: A human who knows how to cook can turn that onion from a pungent raw sphere into strips of caramelized goodness. In a traditional algorithm, a programmer would write every single step of the cooking instructions.

But in an algorithm developed by artificial intelligence, given the end point as a goal, the program would figure out how to get from raw to caramelized itself. Hence, the machine learned. For example, a human process like being able to recognize that a cat is a cat takes so much complicated brain power that it would be impossible to write out step by step.

But by giving a program a bunch of images of a cat, and images that are not a cat, and showing the desired endpoint as categorizing a cat image as a cat, the computer can learn to execute that process itself. However, remember that an algorithm just means a set of instructions. Domingos explained that programmers spend enormous amounts of time fixing mistakes in algorithms so that the lines of code produce the appropriate results.

Take a hiring algorithm , which ostensibly should find the best candidate for a job. And problems with bias can get even worse with algorithms that utilize artificial intelligence. For example, a hiring algorithm powered by machine learning might use as its starting point a bunch of resumes of candidates, and as its output the resumes of people who were hired in the past.



0コメント

  • 1000 / 1000