Pandas acting weird when using dataframe.shift()

Question

I am reading in some data which looks like this:

In this dataset, a number of rows have null in column 16. I need to shift the values in such rows to the right, so that the values which begin with "*" (eg. column 16 row 4, column 13 row 5 etc.) will move to the columns right of them. (Eventually I will do this in a loop so that those values will go into column 16) .

The data to the left of these values also have to move too. For example when the data in {column 7 row 16} moves to {column 8, row 16}, the data in {column 2 row 16} should move to {column 3 row 16}.

However, I do not want the data in column 1 (zero index column 0) to move as I will be using that as an index for my data.

Hence my expected output is this:

I am using the code below to achieve this:

import StringIO
import pandas

# Store the csv string in a variable and turn that into a dataframe
# This string here is the same as the data in the image above.
gps_string = """2010-01-12 18:00:00,$GPGGA,180439,7249.2150,N,11754.4238,W,2.0,10,0.9,-8.1,M,-12.4,M,,*57,,,
2010-01-12 17:30:00,$GPGGA,173439,7249.2160,N,11754.4233,W,2.0,11,0.8,-4.5,M,-12.4,M,,*5B,,,
2010-01-12 17:00:00,$GPGGA,170439,7249.2152,N,11754.4235,W,2.0,11,0.8,-3.1,M,-12.4,M,,*5C,,,
2010-01-12 16:30:00,N,11754.4210,W,2,9.0,1.1,-13.1,M,-12.4,M,,*6C,,,,,,
2010-01-12 16:00:00,N,11754.4229,W,2,10.0,0.9,-2.9,M,-12.4,M,,*53,,,,,,
2010-01-12 15:30:00,N,11754.4269,W,2,9.0,0.8,-4.3,M,-12.4,M,,*54,,,,,,
2010-01-12 15:00:00,N,11754.4267,W,2,10.0,0.8,-1.6,M,-12.4,M,,*56,,,,,,
2010-01-12 14:30:00,$GPGGA,143439,7249.2152,N,11754.4253,W,2.0,11,0.7,-4.3,M,-12.4,M,,*56,,,
2010-01-12 14:00:00,N,11754.4245,W,2,10.0,0.9,-7.0,M,-12.4,M,,*50,,,,,,
2010-01-12 13:30:00,$GPGGA,133439,7249.2134,N,11754.4243,W,2.0,11,0.7,-10.7,M,-12.4,M,,*61,,,
2010-01-12 13:00:00,N,11754.4245,W,2,10.0,0.8,-5.5,M,-12.4,M,,*56,,,,,,
2010-01-12 12:30:00,N,11754.4226,W,2,10.0,0.9,-7.1,M,-12.4,M,,*59,,,,,,
2010-01-12 12:00:00,N,11754.4238,W,2,10.0,0.8,-6.5,M,-12.4,M,,*51,,,,,,
2010-01-12 11:30:00,N,11754.4227,W,2,10.0,0.8,0.1,M,-12.4,M,,*73,,,,,,
2010-01-12 11:00:00,-7.4,M,-12.4,M,,*5F,,,,,,,,,,,,
2010-01-12 10:30:00,N,11754.4271,W,2,8.0,1.1,-8.4,M,-12.4,M,,*5A,,,,,,
""" 
# Read the csv string into a dataframe, with no headers
# Make the first column with timestamp values the index column.
gps_df = pd.read_csv(StringIO.StringIO(gps_string), header=None, 
index_col=0)
rows_to_shift = gps_df[gps_df[15].isnull()].index

# Shift the rows here.
gps_df.loc[rows_to_shift] = gps_df.loc[rows_to_shift].shift(periods=1, axis=1)
gps_df.to_csv("f.csv") # Creates a file after shift to see the output

I get the following output file when the code is executed.

From this I see that the shift function creates a column of null(s) at column 5 for some reason, and it also moves the data that was originally in column 10 into column 15, any idea why this might be the case?

Could this be a bug in the dataframe.shift() function? or am I doing something wrong here?

It would be really helpful if you could provide some data as text rather than as pictures so people could test and compare solutions to what you've tried — G. Anderson, Jun 17 '19 at 20:17
@G.Anderson, I have added some test data to the question as requested — Kikanye, Jun 18 '19 at 04:07
Please see [how to create good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and show both your sample input and desired output, as your problem description is confusing — G. Anderson, Jun 18 '19 at 15:58
@G.Anderson I've made some edits to the question, is it better now? — Kikanye, Jun 18 '19 at 21:19
I'm also seeing some very weird behavior with `.shift()` in my testing, trying to figure it out on my end as well. It seems to reorder text columns when applied along axis 1 — G. Anderson, Jun 18 '19 at 22:00
@G.Anderson Same issue I'm dealing with, it moves one of my columns (column 10) to column 15 — Kikanye, Jun 18 '19 at 22:57
@G.Anderson Its a bug from pandas I've opened an issue here https://github.com/pandas-dev/pandas/issues/26929 you can follow it for updates — Kikanye, Jun 18 '19 at 23:56

score 1 · Accepted Answer · answered Jun 19 '19 at 15:27

This is a bug in pandas, and more details can be found here .

it seems that shifting object columns will automatically shift to the next column that has an object dtype.

In order to work around this issue, I select the indexes I want to shift, convert all the data in my dataframe to strings, perform the shift, get the data as a csv string again, and then recreate the dataframe to get the previous datatypes.

Below is the code I have used to work around this issue:

import pandas as pd
import StringIO

gps_string = """
"2010-01-12 18:00:00","$GPGGA","180439","7249.2150","N","11754.4238","W","2","10","0.9","-8.1","M","-12.4","M","","*57","","",""
"2010-01-12 17:30:00","$GPGGA","173439","7249.2160","N","11754.4233","W","2","11","0.8","-4.5","M","-12.4","M","","*5B","","",""
"2010-01-12 17:00:00","$GPGGA","170439","7249.2152","N","11754.4235","W","2","11","0.8","-3.1","M","-12.4","M","","*5C","","",""
"2010-01-12 16:30:00","N","11754.4210","W","2","09","1.1","-13.1","M","-12.4","M","","*6C","","","","","",""
"2010-01-12 16:00:00","N","11754.4229","W","2","10","0.9","-2.9","M","-12.4","M","","*53","","","","","",""
"2010-01-12 15:30:00","N","11754.4269","W","2","09","0.8","-4.3","M","-12.4","M","","*54","","","","","",""
"2010-01-12 15:00:00","N","11754.4267","W","2","10","0.8","-1.6","M","-12.4","M","","*56","","","","","",""
"2010-01-12 14:30:00","$GPGGA","143439","7249.2152","N","11754.4253","W","2","11","0.7","-4.3","M","-12.4","M","","*56","","",""
"2010-01-12 14:00:00","N","11754.4245","W","2","10","0.9","-7.0","M","-12.4","M","","*50","","","","","",""
"2010-01-12 13:30:00","$GPGGA","133439","7249.2134","N","11754.4243","W","2","11","0.7","-10.7","M","-12.4","M","","*61","","",""
"2010-01-12 13:00:00","N","11754.4245","W","2","10","0.8","-5.5","M","-12.4","M","","*56","","","","","",""
"2010-01-12 12:30:00","N","11754.4226","W","2","10","0.9","-7.1","M","-12.4","M","","*59","","","","","",""
"2010-01-12 12:00:00","N","11754.4238","W","2","10","0.8","-6.5","M","-12.4","M","","*51","","","","","",""
"2010-01-12 11:30:00","N","11754.4227","W","2","10","0.8","0.1","M","-12.4","M","","*73","","","","","",""
"2010-01-12 11:00:00","-7.4","M","-12.4","M","","*5F","","","","","","","","","","","",""
"2010-01-12 10:30:00","N","11754.4271","W","2","08","1.1","-8.4","M","-12.4","M","","*5A","","","","","",""

 """

gps_df = pd.read_csv(StringIO.StringIO(gps_string), header=None, index_col=0)
rows_to_shift = gps_df[gps_df[15].isnull()].index  # get the indexes to shift
gps_df_all_strings = gps_df.astype(str)  # Convert all the data to be of type str (string)

# Shift the data
gps_df_all_strings.loc[rows_to_shift] = gps_df_all_strings.loc[rows_to_shift].shift(periods=1, axis=1)
s = gps_df_all_strings.to_csv(header=None)  # Put shifted csv data into a string after shifting.
new_gps_df = pd.read_csv(StringIO.StringIO(s), header=None, index_col=0)  # re read csv data.

Pandas acting weird when using dataframe.shift()

1 Answers1