10

I'm using the Twitter API and I have the following string that is bugging me Proyecto de ingeniera comercial, actual Profesora de matemáticas \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000Enseña Chile
I want to store that in PostgreSql, but \u0000 is not accepted, so I want to replace it.
I try to use string= string.replaceAll("\\u0000", ""); but it doesn't work. I just get the following

String json = TwitterObjectFactory.getRawJSON(user);
System.out.println(json);
json = json.replaceAll("\\u0000", "");
System.out.println(json);

The output (only the part that matters)

Proyecto de ingeniera comercial, actual Profesora de matemáticas \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000Enseña Chile
Proyecto de ingeniera comercial, actual Profesora de matemáticas \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000Enseña Chile

If I put that part in an String in java the replacement works, but if I put it in an text file or I read it directly for Twitter it doesnt work
So my question is, How do I replace \u0000 from an string?
By the way, the full string is this

{"utc_offset":null,"friends_count":83,"profile_image_url_https":"https://pbs.twimg.com/profile_images/2636139584/3a8455cd94045fa6980402add14796a9_normal.jpeg","listed_count":1,"profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","default_profile_image":false,"favourites_count":0,"description":"Proyecto de ingeniera comercial, actual Profesora de matemáticas \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000Enseña Chile","created_at":"Sat May 28 14:24:06 +0000 2011","is_translator":false,"profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","protected":false,"screen_name":"Fsquadritto","id_str":"306825274","profile_link_color":"0084B4","is_translation_enabled":false,"id":306825274,"geo_enabled":false,"profile_background_color":"C0DEED","lang":"es","profile_sidebar_border_color":"C0DEED","profile_location":null,"profile_text_color":"333333","verified":false,"profile_image_url":"http://pbs.twimg.com/profile_images/2636139584/3a8455cd94045fa6980402add14796a9_normal.jpeg","time_zone":null,"url":null,"contributors_enabled":false,"profile_background_tile":false,"entities":{"description":{"urls":[]}},"statuses_count":2,"follow_request_sent":false,"followers_count":36,"profile_use_background_image":true,"default_profile":true,"following":false,"name":"Fiorella Squadritto","location":"","profile_sidebar_fill_color":"DDEEF6","notifications":false,"status":{"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"possibly_sensitive":false,"coordinates":null,"created_at":"Fri Oct 12 17:40:35 +0000 2012","truncated":false,"in_reply_to_user_id_str":null,"source":"<a href=\"http://instagram.com\" rel=\"nofollow\">Instagram<\/a>","retweet_count":1,"retweeted":false,"geo":null,"in_reply_to_screen_name":null,"entities":{"urls":[{"display_url":"instagr.am/p/QsOQxTNfvQ/","indices":[49,69],"expanded_url":"http://instagr.am/p/QsOQxTNfvQ/","url":"http://t.co/GKziME7N"}],"hashtags":[{"indices":[24,34],"text":"eduinnova"}],"user_mentions":[{"indices":[35,47],"screen_name":"ensenachile","id_str":"57099132","name":"Enseña Chile","id":57099132}],"symbols":[]},"id_str":"256811615171792896","in_reply_to_user_id":null,"favorite_count":1,"id":256811615171792896,"text":"Amando las matemáticas! #eduinnova @ensenachile  http://t.co/GKziME7N","place":null,"contributors":null,"lang":"es","favorited":false}}
FeanDoe
  • 1,608
  • 1
  • 18
  • 30
  • 2
    try with `json = json.replace("\u0000", "");` – Sim1 Mar 11 '15 at 15:00
  • Strange... I just tried the same by initializing a string with the value you provided and replaced worked fine. Did you try the same, with a predefined string instead of the API response? – Ofer Lando Mar 11 '15 at 15:02
  • I tried the same with a predefined String and it works, but with the api response (or reading from a file) it doesnt work... But the comment by Siome Riboldi with double backslash works fine :D... I tried a lot of things but not replace alone – FeanDoe Mar 11 '15 at 15:05

3 Answers3

23
string = string.replace("\u0000", ""); // removes NUL chars
string = string.replace("\\u0000", ""); // removes backslash+u0000

The character with u-escaping is done on java source level. For instance "class" is:

public \u0063lass C {

Also you do not need regex.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • 1
    Thanks, `string.replace("\\u0000", "");` (with double backslash) works (: – FeanDoe Mar 11 '15 at 15:06
  • With single not? So really backslash+u+0000 was written. – Joop Eggen Mar 11 '15 at 15:24
  • 2
    It seems that the original question is about replacing the null byte in raw JSON string where null byte is encoded. I guess the correct way to deal with the problem would have been encoding the JSON string correctly before giving it as input to PostgreSQL. Following works just fine in PostgreSQL with `json` field type: `insert into test values (1, '{ "string_with_null": "a\u0000b" }');`. – Mikko Rantalainen May 30 '18 at 06:19
6

The first argument to replaceAll is a regular expression, and the Java regex engine understands \uNNNN escapes so

json.replaceAll("\\u0000", "")

will search for the regular expression \u0000, which matches instances of the Unicode NUL character (U+0000), not instances of the actual string \u0000. If you want to match the string \u0000 then you need to use the regular expression \\u0000, which in turn means the Java string literal "\\\\u0000"

json.replaceAll("\\\\u0000", "")

Or more simply, use replace (whose first argument is a literal string rather than a regex) instead of replaceAll

json.replace("\\u0000", "")
Ian Roberts
  • 120,891
  • 16
  • 170
  • 183
0

There are some instances where input text contains multiple backslash followed by \u0000. To handle all the cases

  String test = "ABC\\u0000DEF\\\\u0000123";
  System.out.println(test.replaceAll("[\\\\]+u0000","")); 
Abhinaya P
  • 244
  • 3
  • 13