I have a txt-file, and I am importing it using the read method. The goal is to turn the columns into a df. This works fine for the most part. However, the description column consists of multiple words with white spaces. When I try to turn the description variable into a column, only the first word is grabbed, which makes sense, but I want the whole string. For example: PACK OF 20 NAPKINS PANTRY DESIGN only Pack is grabbed.
I tried different workarounds with the index and re.sub, but nothing works really.
I know if I use
ttt = pd.read_csv("csv/???")
it is not a problem and the multiword column is imported correctly.
How can I put the complete description string into one column using a read method?
This maybe a duplicate. Anyway, did not find an answer to this problem here or online. Maybe I do express the question wrongly.
Anybody knows a solution? Thanks.
My code:
fin_2 = open('csv/ecom_sales_full_cat.txt', 'rt')
fout_2 = open('csv/ecom_sales_full_cat_2.txt', 'wt')
for line in fin_2:
fout_2.write(line.replace('United Kingdom', 'UnitedKingdom').\
replace('Hong Kong', 'HongKong').replace('Major Category ', 'MajorCategory ').\
replace('Minor Category', 'MinorCategory'))
fin_2.close()
fout_2.close()enter code here
with open('csv/ecom_sales_full_cat_2.txt' ,'r') as file:
text= file.read()
f = open('csv/ecom_sales_full_cat_2.txt', 'r')
ecom_sales_full_cat_2_data = f.read()
ecom_sales_full_cat_2_rows = ecom_sales_full_cat_2_data.split('\n')
ecom_sales_full_cat_2_rows = ecom_sales_full_cat_2_rows[0:700]
index_col_5 = []
invoice_no_5 = []
unit_price_5 = []
order_value_5 = []
quantity_5 = []
country_5 = []
invoice_date_5 = []
invoice_time_5 = []
year_month_5 = []
major_category_5 = []
minor_category_5 = []
description_5 = []
for i in ecom_sales_full_cat_2_rows:
values_full_cat = i.split()
index_col_5_value = values_full_cat[0]
invoice_no_5_value = values_full_cat[1]
unit_price_5_value = values_full_cat[2]
order_value_5_value = values_full_cat[3]
quantity_5_value = values_full_cat[4]
country_5_value = values_full_cat[5]
invoice_date_5_value = values_full_cat[6]
invoice_time_5_value = values_full_cat[7]
year_month_5_value = values_full_cat[8]
major_category_5_value = values_full_cat[9]
minor_category_5_value = values_full_cat[10]
description_5_value = values_full_cat[11]
index_col_5.append(index_col_5_value)
invoice_no_5.append(invoice_no_5_value)
unit_price_5.append(unit_price_5_value)
order_value_5.append(order_value_5_value)
quantity_5.append(quantity_5_value)
country_5.append(country_5_value)
invoice_date_5.append(invoice_date_5_value)
invoice_time_5.append(invoice_time_5_value)
year_month_5.append(year_month_5_value)
major_category_5.append(major_category_5_value)
minor_category_5.append(minor_category_5_value)
description_5.append(description_5_value)
print(invoice_no_5[:10])
print(major_category_5[:10])
print(minor_category_5[:10])
print(description_5[:10])
The txt:
OrderValue Quantity Country InvoiceDate InvoiceTime Year-Month Major Category Minor Category Description 0 549185 0.85 10.20 12 United Kingdom 7/4/11 09:35 2011-04 Clothes Tops PACK OF 20 NAPKINS PANTRY DESIGN
Output
['InvoiceNo', '549185', '576381', '551192', '573553', '539436', '580893', '568321', '552949', '567461']
['MajorCategory', 'Clothes', 'Clothes', 'Kitchen', 'Garden', 'Garden', 'Garden', 'Clothes', 'Clothes', 'Household']
['MinorCategory', 'Tops', 'Shoes', 'Cutlery', 'Turf', 'Hoses', 'Hoses', 'Hats', 'Tops', 'Curtains']
['Description', 'PACK', 'NATURAL', '36', 'SET', 'FINE', 'SEWING', '3', 'BULL', 'STRAWBERRY']