Some information is missing (e.g. how to handle missing values? or should incomplete entries be filtered out? do you need the result to be sorted or not?), but I'll attempt an answer nonetheless:
To achieve your desired outcome of converting the type and/or content of your tuple 'columns', you need to map and/or typecast the old values to new ones.
Mapping the strings
You can do different approaches to map different input values to a range of output values. I'll use two in my answer: (1) if ... else
statements and (2) a python way of doing a switch ... case
statement.
In order to get a bool
value from the different possible entries for 'Survived', I used approach (2). For this you set up a dictionary with your mappings and then get the appropriate entry from it for each of your candidates (see Replacements for switch statement in Python?). You can combine this with the string lower()
function so that you can disregard case (How do I lowercase a string in Python?). You can add also a default value that should be used in case the key can not be found in the dict, in my example below, I use None
.
Example:
entry = 'NO'
switcher_survived = {
'no': False,
'dead': False,
'yes': True
}
result = switcher_survived.get(entry.lower(), None)
The same approach can be used to set the gender based on the different input possibilities.
Typecasting
For the numbers, you can simply cast them to the desired type. However, this will only work if the string contains a number that can successfully be cast. Note that in your example, you have entries with an empty string which will lead to a ValueError
when you try to cast. So you need to check that and may again want default to some value. I use nan = float('NaN')
as this is a nice way of maintainig the correct type without using additional packages (see Assigning a variable NaN in python without numpy).
Example:
nan = float('NaN')
entry = '2.5'
result = (float(entry) if float(entry) != "" else nan)
I'm using a one-line if-then-else statement here (see Putting a simple if-then-else statement on one line), because that is beneficial for the full example at the end.
Putting it together
actual = [
('Survived', 'Pclass', 'Name', 'Gender', 'Age', 'Fare'),
('no', '3', 'Braund Mr. Owen Harris', 'male', '22', '7.25'),
('Dead', '3', 'Braund Ms. Maria', 'Female', '21', ''),
('Yes', '1', 'Cumings Mrs. John Bradley (Florence Briggs Thayer)', 'F', '38', '71.28'),
('', '3', 'Vander Planke Miss. Augusta', 'female', '', ''),
('Dead', '4', 'Lennon Mr. Denis', 'male', '13', '15.5')]
nan = float('NaN')
switcher_survived = {
'no': False,
'dead': False,
'yes': True
}
switcher_gender = {
'male': 'male',
'm': 'male',
'female': 'female',
'f': 'female'
}
def process(lst):
result = []
current = 1
while current < len(lst):
tuple = (switcher_survived.get(lst[current][0].lower(),''),
int(lst[current][1]),
lst[current][2],
switcher_gender.get(lst[current][3].lower(),''),
(float(lst[current][4]) if lst[current][4] != "" else ''),
(float(lst[current][5]) if lst[current][5] != "" else 25.0)
)
result.append(tuple)
current += 1
return [lst[0], result]
expected = process(actual)
print(expected)
Some remarks:
In this final example, I have changed the default value for the column 'Fare' to 25.0, as to conform with your expected outcome.
For the same reason, I have also changed the default values for 'Survived', 'Gender' and 'Age' to the empty string ''
instead of None
respectively NaN
. Please note that this violates your own requirements, as the empty string is obviously not of type bool
or float
. This may have implications when you work with the data later. Especially, the empty string in column 'Survived' may be silently evaluated to False
.
To filter out incomplete data, you could change the default values back to None
and NaN
and only add complete rows to your final data set. For that, you could check if any of the tuples' fields are None
(see What is the best way to check if a tuple has any empty/None values in Python?):
if not any(map(lambda x: (x is None) or (x is nan), tuple)):
result.append(tuple)
If you wanted to sort the list by an arbitrary column, you could use a lambda fuction as sortkey (see Syntax behind sorted(key=lambda: ...)) before you return the result. E.g. to sort by the name:
result = sorted(result, key=lambda tuple: tuple[2])