I have seen some posts about this matter before but I really can't find a solution that solves my problem.
I got an array with 1743 elements which is loaded from a txt file. Each element is a string '1,1232,3,2018-03-24' where the structure is movieID,customerID,rating,date.
I am rather new to python but I do know that I want this as a dataframe with the column name as followed in the structure of the string.
I have trouble with converting this into a dataframe. I was thinking of trying to write the array elements to a file and then load it into a dataframe from that file but I am very much aware that this is really time consuming and the total dataset from the file are over 24 million rows.
UPDATE------------->
I have now been able to split the string into 4 elements containing movieID, customerID, rating, date
now there is only the problem to put it into the dataframe correctly. Below is all code I have and the result can be shown below
movieID = ''
names =['customerID', 'movieID', 'rating']
data = pd.DataFrame(columns = names)
tf = False
for line in file:
tf = False
line = line.strip('\n')
if(line[len(line)-1] == ':'):
movieID = line.strip(':')
tf = True
if(tf != True):
line = movieID + ',' + line
text = line.split(',')
df = pd.DataFrame([text[1], text[0], text[2]], columns = names)
data = data.append(df)
tf = False
print(data)
ValueError: Shape of passed values is (1, 3), indices imply (3, 3) at df = pd.DataFrame([text[1], text[0], text[2]], columns = names)