hsurich

hsurich

coder
github
telegram
x
email
steam

DataOutputStream.writeBytes method in JDK causes garbled characters when passing Chinese text.

There is a section in the Java project that contains an encapsulated method for sending POST requests.

import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Map;
import java.util.Map.Entry;

public class FormDataPostRequest {

    public static String sendPostFormData(String url, Map<String, String> headerParams, String text) {
        String result = ""; // Result to be returned
        BufferedReader in = null; // Read response input stream

        try {
            // Create connection
            URL apiUrl = new URL(url);
            HttpURLConnection connection = (HttpURLConnection) apiUrl.openConnection();
            connection.setDoOutput(true);
            connection.setDoInput(true);
            connection.setRequestMethod("POST");
            connection.setUseCaches(false);
            connection.setInstanceFollowRedirects(true);
            connection.setRequestProperty("Content-Type", "multipart/form-data; boundary=" + "*****"); // Set boundary
            if (headerParams != null) {
                for (Entry<String, String> entry : headerParams.entrySet()) {
                    connection.setRequestProperty(entry.getKey(), entry.getValue());
                }
            }
            connection.connect();

            // Send POST request with string parameter
            DataOutputStream out = new DataOutputStream(connection.getOutputStream());
            // Add string parameter
            out.writeBytes("--*****\r\n");
            out.writeBytes("Content-Disposition: form-data; name=\"text\"\r\n\r\n");
            out.write(text.getBytes("UTF-8"));
            out.writeBytes("\r\n");
            out.writeBytes("--*****--\r\n");
            out.flush();
            out.close();

            // Read response
            // Define BufferedReader input stream to read URL response with UTF-8 encoding
            in = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF-8"));
            String line;
            // Read the returned content
            while ((line = in.readLine()) != null) {
                result += line;
            }
        } catch (Exception e) {
            e.printStackTrace();
            System.out.println("Internal problem in the HTTP request method");
        } finally {
            try {
                if (in != null) {
                    in.close();
                }
            } catch (IOException ex) {
                ex.printStackTrace();
            }
        }
        return result;
    }
}

There was no problem using it originally, but later in the process, it was found that when the value of the parameter is in Chinese, the receiver will receive garbled characters. The problem was found in this line of code:

out.writeBytes(text + "\r\n");

Looking at the source code of writeBytes:

/**
 * Writes out the string to the underlying output stream as a
 * sequence of bytes. Each character in the string is written out, in
 * sequence, by discarding its high eight bits. If no exception is
 * thrown, the counter {@code written} is incremented by the
 * length of {@code s}.
 *
 * @param      s   a string of bytes to be written.
 * @throws     IOException  if an I/O error occurs.
 * @see        java.io.FilterOutputStream#out
 */
public final void writeBytes(String s) throws IOException {
    int len = s.length();
    for (int i = 0 ; i < len ; i++) {
        out.write((byte)s.charAt(i));
    }
    incCount(len);
}

It can be seen that each character in the string s is forcibly converted to a byte type, and the byte type is 8 bits, while the Chinese char is 16 bits. Therefore, when the character is Chinese, it will be truncated, resulting in garbled characters. You can add some logs to see the converted byte array after conversion.

public final void writeBytes(String s) throws IOException {
    int len = s.length();
    for (int i = 0 ; i < len ; i++) {
        char c = s.charAt(i);
        byte b = (byte) c;
        System.out.println("Character: " + c + " Converted byte array: " + b);
        out.write(b);
    }
    incCount(len);
}

In the end, it can be seen that the Chinese characters are truncated after being forcibly converted, resulting in garbled characters. Therefore, the solution is to convert the string to a byte array and then write it to the output stream.

// Send POST request with string parameter
DataOutputStream out = new DataOutputStream(connection.getOutputStream());
// Add string parameter
out.writeBytes("--*****\r\n");
out.writeBytes("Content-Disposition: form-data; name=\"text\"\r\n\r\n");
out.write(text.getBytes("UTF-8"));
out.writeBytes("\r\n");
out.writeBytes("--*****--\r\n");
out.flush();
out.close();
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.