How can I disperse data with lots of double quotes which stacks into only one column in pandas

Question

I have a csv file from 2 smartwatches. The file contains time, date, HR etc. data. when I try to read the file with pandas it stacks everything into the first column and then fills the rest of the columns with Nan.

The first row is:

Activity Type,Date,Favorite,Title,Distance,Calories,Time,Avg HR,Max HR,Avg Speed,Max Speed,Elev Gain,Elev Loss,Avg Stride Length,Avg Vertical Ratio,Avg Vertical Oscillation,Training Stress Score®,Grit,Flow,Total Strokes,Avg. Swolf,Avg Stroke Rate,Bottom Time,Min Temp,Surface Interval,Decompression,Best Lap Time,Number of Runs,Max Temp

and data looks like this:

"road_biking,2018-08-29 13:02:00,false,""bike"",""51,60"",""1.192"",""02:10:05"",""--"",""--"",""23,8"",""--"",""--"",""--"",""0,00"",""0,0"",""0,0"",""0,0"",""0,0"",""0,0"",""--"",""--"",""--"",""0:00"",""0,0"",""0:00"",""No"",""00:00.00"",""1"",""0,0"""

I have tried various things from stackoverflow such as df = pd.read_csv(filename, sep=',').replace('"','', regex=True)

(pandas data with double quote)

import numpy as np
import pandas as pd

df_garmin = pd.read_csv("dogacapanoglu garmindata until may1st2019.csv")
df_garmin.to_csv("garmindata_till_may2019")
df_garmin = pd.read_csv("garmindata_till_may2019").set_index("Unnamed: 0")
df_garmin.head()



df_garmin.columns

returns this: Index(['Activity Type', 'Date', 'Favorite', 'Title', 'Distance', 'Calories', 'Time', 'Avg HR', 'Max HR', 'Avg Speed', 'Max Speed', 'Elev Gain', 'Elev Loss', 'Avg Stride Length', 'Avg Vertical Ratio', 'Avg Vertical Oscillation', 'Training Stress Score®', 'Grit', 'Flow', 'Total Strokes', 'Avg. Swolf', 'Avg Stroke Rate', 'Bottom Time', 'Min Temp', 'Surface Interval', 'Decompression', 'Best Lap Time', 'Number of Runs', 'Max Temp'], dtype='object')

df_garmin.dtypes

returns all columns float64 except "Activity type" (returnsit as object)

I get all the columns without problem but the code stacks every data into 'Activity type' column. The rest of the columns are all filled NaN.

What can I do to disperse data to their proper columns?

There are 29 listed column names and 39 data points in the first row. What is the desired result for the 10 extra entries with no colum name? — modesitt, May 02 '19 at 22:33

How can I disperse data with lots of double quotes which stacks into only one column in pandas

0 Answers0