2017.07.07

Recommender system for Web security engineers - basic level -

When you hear "recommender system", you will imagine a system that recommends items of your choice in an online shop. The recommender system is implemented and widely used in various systems, such as music recommendation in music streaming services, property recommendation in real estate services, etc.

In order to expand the “recommender system” to the security field, we have developed a system called PyRecommender which recommends injection codes for engineers to test web app vulnerabilities. So, we will explain the mechanism of the system and show the demo.

Our system vectorizes various patterns of vulnerabilities using a large number of vulnerability assessments we have conducted in the past, and it learns the patterns using machine learning. If it detects the behavior of a possible vulnerability, it recommends the optimum injection codes based on the learning result.

System overview

Figure 1 System overview

Figure 1 is an outline of PyRecommender. The system consists of two subsystems. The first is the Investigator which observes the behavior of web apps and generates feature vectors of the patterns of vulnerabilities. The second is the Recommender that recommends injection codes to test for the vulnerabilities based on the feature vectors generated by the Investigator. The Recommender has a recommender engine which uses a learned machine learning model. By linking these two subsystems, PyRecommender recommends injection codes to humans to test vulnerabilities of a web app.

In this blog, we will use reflection-type XSS as an example to explain.

Note:

The source codes and learning data used in this verification are listed in the "Verification codes" page. If you are interested, please use them in an environment under your control at your own risk.

Investigator

The Investigator lists parameter values that are reflected in the HTTP response while crawling the target web app. It then examines the output locations of each parameter value, available symbols and script strings and vectorizes the data.

For example, when a target web app returns the HTTP response shown in Figure 2, the Investigator summarizes the result and vectorizes the found features as shown in Figure 3.

[request]
GET /?x=test"'`<>alert();prompt();confirm();alert``;<script>Msgbox(); HTTP/1.1
-------------------------------------------------------------------------------------
[response]
<div class=test`<>alert();prompt();confirm();alert``;Msgbox();  >123</div>

Figure 2 HTTP request and response (“, <script>, etc. can’t be used)

	Observation	Vector
Output locations	HTML tag : div attribute: class quotation: None	10 5 0
Available symbols and script strings	“ : Fail ‘ : Fail ` : Pass < : Pass > : Pass alert(); : Pass prompt();: Pass confirm(): Pass alert``; : Pass <script> : Fail Msgbox();: Pass	1 1 0 0 0 0 0 0 0 1 0

Observation

Vector

Output locations

HTML tag : div
attribute: class
quotation: None

10
5
0

Available symbols and script strings

“       : Fail
‘       : Fail
`        : Pass
<        : Pass
>        : Pass
alert(); : Pass
prompt();: Pass
confirm(): Pass
alert``; : Pass
<script> : Fail
Msgbox();: Pass

Figure 3 Result of the observation

The Investigator uses the following predefined conversion table to vectorize the result of the observation.

Figure 4 Conversion Table (example)

Note: In this blog, the Investigator only crawls web pages that can be accessed using "<a href='xxx'>” and targets query parameters only.

The Investigator passes the vectorized features of the web app to the Recommender using the mechanisms above.

As you can see, the Investigator’s role is to examine the output locations of the parameter values, available symbols and script strings and to convert the result into feature vectors. We have developed the Investigator for the purpose of this verification, but if your vulnerability scanner or crawler has a similar function, you can use it instead.

Recommender

The Recommender outputs (recommends) the optimal injection codes to test for vulnerabilities using feature vectors generated by the Investigator.

We developed the recommender engine using a multilayer perceptron (MLP) which is a machine learning algorithm (Figure 5).

Figure 5 Using MLP

The MLP receives a feature vector from a blue node and outputs an injection code from a red node that corresponds to the input feature vector. The injection codes are output in the order of most likely to successfully exploit the possible vulnerability. Figure 6 shows an example of a recommendation.

[inputted feature vector]
3,2,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0
---------------------------------------------------------
[recommended result]
0.99562198 : "></iframe><script>alert();</script>
0.00195420 : "><script>alert();</script>
0.00151912 : 'javascript:alert();

Figure 6 Example of a recommendation

In this example, the Recommender recommends the string ["></iframe><script>alert();</script>] with a 99.5% chance of successful exploit.

By the way, MLP is a supervised learning model, so it needs to pre-learn. In other words, we need to prepare various vulnerability patterns as learning data. For this blog, we used the data from WAVSEP's XSS case (about 1,300 patterns) which we manually collected for the learning.

Note:

The training data includes output locations of parameter values, available symbols and script strings (as shown in Figure 3) paired with the injection codes that led to exploit/signal the vulnerabilities. Also, we used the same table as Figure 4 to vectorize the data.
You can refer to the learning data used from here.

Demonstration

We demonstrate PyRecommender using Webseclab in a movie.

You can see the demonstration below.

Movie 1 Demo Movie (PyRecommender versus Webseclab)

PyRecommender crawls from the top URL of Webseclab (00:12 – 00:16) and detects possible XSS vulnerabilities in the following URLs and parameters (00:17-00:19). Then, PyRecommender recommends injection codes corresponding to the locations of the XSS vulnerabilities (00:20-00:24). Figure 7 is a list of locations where the possible XSS vulnerabilities are detected and the corresponding injection codes recommended for the vulnerabilities.

No URL(Path) Param Recommended injection codes

No	URL(Path)	Param	Recommended injection codes
1	`/xss/reflect/textare a1`	`in`	0.9993: </textarea><img src=x onerror=alert();> 0.0004: <script>alert();</script> 0.0001: </textarea><img src=x onerror=prompt();>
2	`/xss/reflect/onmouse over_div_unquoted`	`in`	0.9832: onmousemove=alert(); 0.0101: " onmousemove=alert();", 0.0030: "><script>alert();</script>
3	`/xss/reflect/onmouse over_unquoted`	`in`	0.8097: "><frame src="javascript:alert()"> 0.1887: onmousemove=alert(); 0.0005: <img src=x onerror=alert();>

/xss/reflect/textare
a1

in

0.9993: </textarea><img src=x onerror=alert();>
0.0004: <script>alert();</script>
0.0001: </textarea><img src=x onerror=prompt();>

/xss/reflect/onmouse
over_div_unquoted

in

0.9832:  onmousemove=alert();
0.0101: " onmousemove=alert();",
0.0030: "><script>alert();</script>

/xss/reflect/onmouse
over_unquoted

in

0.8097: "><frame src="javascript:alert()">
0.1887:  onmousemove=alert();
0.0005: <img src=x onerror=alert();>

Figure 7 Result of an recommendation (examples)

If you use the injection code from the recommendation that is most likely to successfully exploit, you can normally run the script. However, in the case of No. 3, which is a pattern that has not been learned, the first injection code did not run the script due to the code not matching the HTML syntax, but the second injection code did run the script. We can say that even if PyRecommender has not learned a specific pattern, it may still recommend an injection code that can run a script.

Let's look at some of the requests that used the recommended injection codes and the responses. For the ease of viewing, URL encoding is not applied.

No1：`http://xxx/xss/reflect/textarea1?in=foo1`

[request (default)]
GET /xss/reflect/textarea1?in=foo1 HTTP/1.1
-------------------------------------------------------------------------------------
[response]
<textarea name="in" rows="5" cols="60">foo1</textarea>

Figure 8 Normal HTTP response (excerpt)

[request (use recommended inspection string)]
GET /xss/reflect/textarea1?in=foo1</textarea><img src=x onerror=alert();> HTTP/1.1
-------------------------------------------------------------------------------------
[response]
<textarea name="in" rows="5" cols="60">foo1</textarea><img src=x onerror=alert();></text
area>

Figure 9 Result of using a recommended injection code (runs script)

No3：`http://xxx/xss/reflect/onmouseover_unquoted?in=changeme5`

[request]
GET /xss/reflect/onmouseover_unquoted?in=changeme5 HTTP/1.1
-----------------------------------------------------------------------
[response]
Homepage: <input value=changeme5 name="in" size="40"><BR>

Figure 10 Normal HTTP response (excerpt)

[request (use recommended inspection string)]
GET /xss/reflect/onmouseover_unquoted?in=changeme5 onmousemove=alert();  HTTP/1.1
-------------------------------------------------------------------------
[response]
Homepage: <input value=changeme5 onmousemove=alert();  name="in" size="40"><BR>

Figure 11 Result of using a recommended injection code (runs script)

As shown above, we were able to run the scripts using the recommended injection codes.

In this way, using the Investigator which vectorizes the behavior of web apps and the Recommender which learns the patterns of various XSS, we were able to make a correct assessment for the XSS vulnerability.

We would also like to point out that, in the case of an unlearned vulnerability pattern, PyRecommender can recommend an injection code that may or may not be able to run a script, depending on the pattern. In such case, an engineer manually adds learning data and makes the Recommender re-learn, so it will be able to recommend injection codes with higher accuracy the next time. In other words, the Recommender gets smarter by being repeatedly used.

Movie 2 Demo Movie (Recommender learns)

Conclusion

Our conclusions for this verification are:

By using machine learning, we can recommend injection codes to test vulnerabilities.
Even with an unlearned vulnerability pattern, PyRecommender may be able to recommend injection codes.
PyRecommender can improve the recommendation accuracy by learning patterns of various vulnerabilities.

The learning data used in this verification were simple cases from WAVSEP.
Also, we created the data manually, so the number of data we were able to incorporate was limited. In order to solve this problem, it is necessary to have a mechanism that can automatically vectorize real test results generated in large quantities in bug bounty programs or vulnerability assessments and use them as learning data.

Finally, although what’s shown on this blog is at a basic level, next time we will write about a mechanism to improve the recommendation's robust performance as an intermediate level. Specifically, we will use the Convolutional Neural Network instead of MLP and also verify that PyRecommender can maintain and improve recommendation accuracy with even less learning data.

Verification codes

https://github.com/13o-bbr-bbq/machine_learning_security/tree/master/Recommender

To read other blog entries by Isao Takaesu, click here.

Professional Service Div.
高江洲勲

Download

暴露型ランサムウェア攻撃統計
CIGマンスリーレポート
2025年6月号

最新情報

Recommender system for Web security engineers - basic level -

System overview

Investigator

Recommender

Demonstration

No1：`http://xxx/xss/reflect/textarea1?in=foo1`

No3：`http://xxx/xss/reflect/onmouseover_unquoted?in=changeme5`

Conclusion

Verification codes

Most Read

Download

キーワードで探す

テーマで探す

筆者で探す

シリーズで探す

最新情報

Recommender system for Web security engineers - basic level -

System overview

Investigator

Recommender

Demonstration

No1：http://xxx/xss/reflect/textarea1?in=foo1

No3：http://xxx/xss/reflect/onmouseover_unquoted?in=changeme5

Conclusion

Verification codes

Most Read

Download

キーワードで探す

テーマで探す

筆者で探す

シリーズで探す

No1：`http://xxx/xss/reflect/textarea1?in=foo1`

No3：`http://xxx/xss/reflect/onmouseover_unquoted?in=changeme5`