I just want to wrap up everything mentioned in the comments.
From the official Stata documentation, we have the following:
strL variables can be 0 to 2-billion bytes long.
strL variables are not required to be longer than 2,045 bytes.
str# variables can store strings of up to 2,045 bytes, so strL and str# overlap. This overlap is comparable to the overlap of the numeric types int and float. Any number that can be stored as an int can be stored as a float. Similarly, any string that can be stored as a str#, can be stored as a strL. The reverse is not true. In addition, strL variables can hold binary strings, whereas str# variables can only hold text strings. Thus the analogy between str#/strL and int/float is exact. There will be occasions when you will want to use strL variables in preference to str# variables, just as there are occasions when you will want to use float variables in preference to int variables.
The above information makes me infer two points before I try to replicate your problem.
In your case, the use of strL
was not necessary.
The use of strL
was the source of your problem, with a probable result some compatibility problem with the haven
library.
However, after trying to replicate what you described, I arrived at a different conclusion.
Please gently consider the following code that emulates your problem.
version 16
clear all
set obs 1
*4200 character string generated here: http://www.unit-conversion.info/texttools/random-string-generator/
gen strL str_var_in_strL = "eSw0qZcVs5DHU2GxgRo1Seo9uTwJ0MvHXyYUQidJMRWw8KW1310Ec242O6D4xrLziO4c56WgluSddTy0Q64QapkwGgOMZdy8ru0fyss0nwJvF4M3kBjYGF00ZsvQGYt4DjF51R3vxTzUx4xlApKwaoRADIgFlXvBh2Bug0VVhmXR3uInHDfpmID57kVWiyxX1gELdyPMVzJWizEHVx2GpjBsm1UdRphDdukFtFrnkr1HFRXBekxHkW3uOCHz0wnyDBfwitDGHosctRrWPhIjujnoalOaHkI5jbnENSNJEsOdGohoe5QKZIxtXmVbD4l8m8wLCbuSjZLw8NzU5vjPX57T2yWWasdFMIHk3kFipT0CG3dNForECS8UiW6ZWSIEmO2V62uakfrxTsRb9fIFVBUIHpGizeR0b27OnfSVB2wE2Ix0ij7kR19jz0wIh35fbwkJWqLq93pfHEtGu0FTb8H5A4XNOcR8chEAQBI7zV3rosSGnSP2h9QZtuSAcz1TrRHNMpCguvNf1DD72TCfCaiBXyflOCre7f5zchLA7k2cQ5qi4fBMVc9GnAdGB2vnjFeFlwaUD0AEUhfSJJINRQ2CKfJegqUL0jBgHBVy5cYCNxsP8Gu8NXRUo6vvyiTJMDcBkL0JKNOT4usSDi4v86cJNzQQa3ArafRzOv1RFz8BfI7pP7rXDLD6d1Z1miCqTZ8UtJBVQ0Z0eCQmTrlAvlu5busOjcAl4ZV7THH6qCV8tI53zh1THBfjnEgoPxy8UIaIK6tXDUM4RFMMd1366324mJEVwyvc5CWgzPian39Q3GFLl6zXCfD4pw7rSUmH5CNOmKgPihxPbV9NSBxiwVK3M07KFS2hZbf3ZDB4CBSJV9geFWKZlR3XNrsPudQgkpsdywNNjZTDwD2RiHF7kQAgyEW7q1w42OC2IbreBBtiPekx6yzCEWBEokLwfhrhbOnDwcnFmfKjnrxCbqypXrSnyvrUP2nUQ9vBmdxCqiVLrBHuDi6Wv2U4vyZ7dTqk84WmnwACXo5PbYY2dmhtjscLMpRw4Q6xVUEWC3qPMnQkbI1UKEq1NfOrF0X8nC0rqrwHQuNuJqHuebJj5AMXVgyZWTaqYIb4gkbGaEze3wNmHbbj1q2bmumiwd6RZRSdx7U3ZwozO9kTkZ69NHFSa2QDi8GrhgvBDMshJVaOR9K8tWcpa2QrFD7cI0ZqzneLXHXm6LsOmtZPFmikKfyts1pASGwZ8DzuWfT3j0daNmyk6y0HwHwM98KOyeuSXnQJOJzunAXkidv90hrgviWUhP70Nrx527JI7vpRr2dClBBnzO2O7YwjTdKTrmfcPs9z4iLeroo20Jg9ODHjvUYWtLRTOKrgvYAgywkj2PoVdwmuYK32UKcqH5EdhPHxWarjsqUuBb40u6nUGIQ0YS7ZuzsnVDerB8hO3rCl0FlMMYgRh4vdPcEG23JQoIwTvdujULg6Lpplyt6yK16UhVSklj6aVNIoA4zr51dULyOzWF9ZqlZz7l90QpXvLuRD9Elr5gxWvNW4fvkCAU3kpEv0s7gHS7ytjNxm0WLk74bN1iP8ZjcxXXBqwtatCo9e1Ayc59VYR9RVxtfvilb038WpHglhWEZoK91rumPSFiCJWUmlkL6P4SAbz5b6LDdW9ybiN8zdZmNtQ2px556d7DF5RRcXgLocLH37Uh6uU9cz2wmWRrcJS4rO9MkUe6KSuVjVLXSsk6J1bnvvagWl4BkY8ZPm0iBg4XXTkRAjfVgnfex1hee47b6k9c5gdS6AJSVazCPpXQJlGJ7NpyAn3hXdHkhaGtokTmne6Zag8DterOyDldPXHXwrG7PgtsmREc0VugLVPrYEbdf9QMHBtGQLwQz04Gyg2lspZ5HbGnOkfI0MTanuMN7XnWdcGBko9gmQKbpONgPqg8POcpxG2aRefswG090hvYKj5gzp3r1nitZZhBm8KDUT8P2Wy06hPxrkZinMGmBv2SIDegXr5uzceHymEnyMQZINS96QCyTiV7z1X1NZ3IBDfVPZTZ9bRxpKyMbAnYzFhx9PYSkescyMMtsGOEli1gFp2PWcqO4bpj0EnKjgWf9ae2R5nDKIkVbsNRCik3JrCM7WjHPfwdZSiA335Eyl0yoHQWjp6YJrR8ykOtw3zL2XHa2ilKIRSypG5dtDwjuqLI1fb7fB5wiG3LuowKqam8HY86aDsuu0DkpED9mAxoSvE7V6WPs5ptg31yoUOgGK1rvGtdpY7CHkaBmmv0jKNYjcZiuET6Q0If2IO36HafXJN8onjMvYadAypEY4IAxkmU2yemCFQkdDBuhr8G4DWcGO46W24QOvMghH6k3HVHeDgj5dNXKIz3rqbCC6tSNMptuQhoM2eWnwBFw6PzNIqKwB6qxbJMs9wxqvDEJlQkKgMZN4HJu1hNFpXIePLN8dSsV8xhgb1pxwFaqhLQMoYXcgobOdcrb7DDpJFbWhUXKn3WHEjO8nk4EAuNmIUdyfWxwxPPmTLEyohT7QrcjvRph82n64aRJIcyDPCho8pqtCTve84PAp4jIeechI8sl92e94jsX7XTZu3LaqDGkEEtcmp69ZqPA5Ev9NZv5ovmWNiu39kKu7QW0YnXGCvorirdScdCow4NyLgnpoAEG0FPz52oN7xU5xyTgY0x5Hel5GDsA28Qoy463tyBNuT51gQROr9XqgiM9Voq0ax1vI8QKFMLXwaLtHwPC7TkFtJbXYNmPt2kXzQ1EjLq0DGiyKd0BFMww3zpEjeSUS7KhCrf5qU7aDjIkVniPs6TGkTBOwG9ItrUv50WJfqgOd6ngHfYWzJFIAgZnGjXhtkHartO3F19iPs5VHRhTEUw2HZbgnTjmf2NmJ1onUkMNSFUMPrNkrfxl78GHNjJrGAkRU0jlXqAuIK8v6uYh4oyqSrFtzPru8GNIWCbka6LKLrMcCysoDk7VQI12xzELxUebiUsnLYCrnvmJtD0T7yv9M8H8rI4YsdbzD3forc56uvwqS0h0Bl6Sw71n781EC0R0V6067RA2TRZ39fs7yXYZ4O5pQ1uHn0qV82aZI1kHxWVJ1omu81KqoFpTnB0QuNd62AeVKRiuMiAf2UFhy40vFgFElRZFipH1TrJAuFYcgwd38kJWTGTYyW51z1DQfVZnlIegEfZQPTjnhFroayXw55MKnJGiVPQ4R0A2nO5LCDmDFmC8SyLz62pn0aQb5tvlQs5Es7woSf3SbxcKh9JndW42V8hQn4uXxbEKhAX9f48VJg5xXYsMOaBz4h0UfsOrOucFH3YVA9c7TVszXbSq7kRQkpsy3xkunDaIfdiAx5wdLE7LbPhUwrC1FWnCa2qQQxNUimxrQ35Woar9tNQSwpVd8ybEaivgQ77HPSYjTdkKX2j2TBCzmGVBesOUnWI9r34kRO5xPsPPDoJvLQe6kns75Yjcuz82OEuUai1PLVRKmzkRjyLp3tt5YDkjzuYCOWNchY3Eup1IEDvGf64wu4S1qLvRrl6HI9jZj7Li2GZc9grCTxbqpUQgCbCxdgmS6a396AJNijmG8uNnchGPlnNVm6DskG7T2pWasVuuhYhkyFNoUWuY5mBXurDMEDyyZPxlY9nlQYKHBNgg6ZnNEYnwCqTLzudDBQ48YG9r3700uvz83jJAX18s2Kjm2LlmOuPJON6rzbPua8Ac2Y0HPuQZD9Ikcim2MOyR9mbvtRTPeLAX3issevCDYaBiG6BFMaN9rW7j1UnlKQZYgTCveE4oH8tT7QGwdWENAjW4kGjS93zCS6QYxyUjg03er43KivMQOVHaT3iznZnQD3Nk5c0T9IKqRpcytY7JaRV7kmayUmKc4d1ApFqY8imlu2iTMiVfY16qMqDeulTtcKjKUuyWBrJSENwv238nWXQudShLeCsiwUMUnJvXyHdSsmAaaoG5O3RA4GkiQVAiX63tWPl6GNfweFAcpxoD4x8hZpbQ1SaBo3pRNwwHuAvzOwm0jKWndfugKqlUmPoDZb9Bx6dzDolUJtHSNYVrOACFY26SyeyeiFHnnd6wZFypkDGL4LgeBbU8TqJTjO2lFXVmCQZjTnO75V43vJKoHUrRSUkZJYTyl3c8tqWJmrUZ7lJo4cUrGzoudgzDHH8N8H73TwXboF8rbQ8i4yb9w51L0EnCd3kdmSXI0PJxuQ9CQ8AD9WKTwvEaSWDTZ"
describe
Contains data
obs: 1
vars: 1
------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------
str_var_in_strL strL %9s
------------------------------------------------------------------------------------
save "route\all_in_strL" , replace
Afterward, I applied your modification to the data I generated.
gen var1 = str_var_in_strL
foreach var of varlist var1 {
generate str `var'_str = `var'
replace `var' = ""
compress `var'
replace `var' = `var'_str
drop `var'_str
describe `var'
}
obs: 1
vars: 2 16 Dec 2020 12:07
------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------
str_var_in_strL strL %9s
var1 strL %9s
------------------------------------------------------------------------------------
save "route\edited_strL" , replace
Weirdly enough, I was able to import both files into R using the haven
library with the following code.
library(haven)
file <- "route_to_first_file/all_in_strL.dta"
# Import first file into R.
dta_in_R <- read_dta(
file,
encoding = NULL,
col_select = NULL,
skip = 0,
n_max = Inf,
.name_repair = "unique")
# Import edited file using your loop method into R.
file <- "route_to_edited_file/edited_strL.dta"
edited_dta_in_R<- read_dta(
file,
encoding = NULL,
col_select = NULL,
skip = 0,
n_max = Inf,
.name_repair = "unique")
The only differences here could be:
- The size of the dataset (just one observation in my case).
- The possibility that you may be using an earlier version of Stata (I am using Stata 16).
Finally, I think the source of the problem wasn't the strL
type of the data, but memory available on your machine, which was probably solved by your compress
step in the for loop you described.
PS: Everything was run on Win10. R version 4.0.3 (2020-10-10) and haven_2.3.1